AlejandroOlmedo commited on
Commit
a14ae0b
·
verified ·
1 Parent(s): b5e7919

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -13,9 +13,24 @@ tags:
13
  licence: license
14
  ---
15
 
16
- # Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-Q4-mlx
17
 
18
- The Model [Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-Q4-mlx](https://huggingface.co/Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-Q4-mlx) was converted to MLX format from [Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math) using mlx-lm version **0.20.5**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Use with mlx
21
 
@@ -26,7 +41,7 @@ pip install mlx-lm
26
  ```python
27
  from mlx_lm import load, generate
28
 
29
- model, tokenizer = load("Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-Q4-mlx")
30
 
31
  prompt="hello"
32
 
 
13
  licence: license
14
  ---
15
 
16
+ # About:
17
 
18
+ **This GRPO trained model is a fine-tuned version of **[**__deepseek-ai/DeepSeek-R1-Distill-Qwen-7B__**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)** on the **[**__DigitalLearningGmbH/MATH-lighteval__**](https://huggingface.co/datasets/DigitalLearningGmbH/MATH-lighteval)** dataset.**
19
+
20
+ GRPO is applied after a distilled R1 model is created to further refine its reasoning capabilities. Rather than the initial distillation step—which transfers capacities from a larger model—GRPO uses reinforcement learning to optimize the policy model by maximizing a reward signal. This fine-tuning step is distinct from distillation and aims to boost performance in chain-of-thought and reasoning tasks.
21
+
22
+ *Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
23
+ [https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
24
+
25
+ I simply converted it to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs (M1,M2,M3,M4 Chips).
26
+
27
+ # Notes:
28
+ - Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
29
+
30
+
31
+ # Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx
32
+
33
+ The Model [Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx](https://huggingface.co/Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx) was converted to MLX format from [Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math) using mlx-lm version **0.20.5**.
34
 
35
  ## Use with mlx
36
 
 
41
  ```python
42
  from mlx_lm import load, generate
43
 
44
+ model, tokenizer = load("Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx")
45
 
46
  prompt="hello"
47