Update README.md
Browse files
README.md
CHANGED
@@ -44,4 +44,12 @@ datasets:
|
|
44 |
- lightblue/reasoning-multilingual-R1-Llama-70B-train
|
45 |
base_model:
|
46 |
- Qwen/Qwen2.5-1.5B-Instruct
|
47 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
- lightblue/reasoning-multilingual-R1-Llama-70B-train
|
45 |
base_model:
|
46 |
- Qwen/Qwen2.5-1.5B-Instruct
|
47 |
+
---
|
48 |
+
|
49 |
+
It's a distill model like s1 and deepseek-r1-distill.
|
50 |
+
|
51 |
+
It's test model. I hope I can reproduce a rl model like RL-Zero.
|
52 |
+
|
53 |
+
This model is a mini-step.
|
54 |
+
|
55 |
+
Thanks for evveryone in the open community.
|