Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,24 @@ tags:
|
|
13 |
licence: license
|
14 |
---
|
15 |
|
16 |
-
#
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
## Use with mlx
|
21 |
|
@@ -26,7 +41,7 @@ pip install mlx-lm
|
|
26 |
```python
|
27 |
from mlx_lm import load, generate
|
28 |
|
29 |
-
model, tokenizer = load("Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-
|
30 |
|
31 |
prompt="hello"
|
32 |
|
|
|
13 |
licence: license
|
14 |
---
|
15 |
|
16 |
+
# About:
|
17 |
|
18 |
+
**This GRPO trained model is a fine-tuned version of **[**__deepseek-ai/DeepSeek-R1-Distill-Qwen-7B__**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)** on the **[**__DigitalLearningGmbH/MATH-lighteval__**](https://huggingface.co/datasets/DigitalLearningGmbH/MATH-lighteval)** dataset.**
|
19 |
+
|
20 |
+
GRPO is applied after a distilled R1 model is created to further refine its reasoning capabilities. Rather than the initial distillation step—which transfers capacities from a larger model—GRPO uses reinforcement learning to optimize the policy model by maximizing a reward signal. This fine-tuning step is distinct from distillation and aims to boost performance in chain-of-thought and reasoning tasks.
|
21 |
+
|
22 |
+
*Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
|
23 |
+
[https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
|
24 |
+
|
25 |
+
I simply converted it to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs (M1,M2,M3,M4 Chips).
|
26 |
+
|
27 |
+
# Notes:
|
28 |
+
- Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
|
29 |
+
|
30 |
+
|
31 |
+
# Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx
|
32 |
+
|
33 |
+
The Model [Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx](https://huggingface.co/Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx) was converted to MLX format from [Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math) using mlx-lm version **0.20.5**.
|
34 |
|
35 |
## Use with mlx
|
36 |
|
|
|
41 |
```python
|
42 |
from mlx_lm import load, generate
|
43 |
|
44 |
+
model, tokenizer = load("Alejandroolmedo/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-4bit-mlx")
|
45 |
|
46 |
prompt="hello"
|
47 |
|