Update README.md
Browse files
README.md
CHANGED
@@ -20,16 +20,16 @@ base_model:
|
|
20 |
</div>
|
21 |
<br>
|
22 |
<div align="center" style="line-height: 1;">
|
23 |
-
<a href="https://github.com/agentica-project/deepscaler"
|
24 |
<img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
25 |
</a>
|
26 |
<a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
|
27 |
<img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
28 |
</a>
|
29 |
-
<a href="https://x.com/Agentica_"
|
30 |
-
<img alt="
|
31 |
</a>
|
32 |
-
<a href="https://huggingface.co/agentica-org"
|
33 |
<img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
|
34 |
</a>
|
35 |
</div>
|
@@ -68,7 +68,7 @@ We employ Deepseek's Group Relative Policy Optimization (GRPO), a simplified RL
|
|
68 |
- Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
|
69 |
- Significant improvements within <200 steps
|
70 |
|
71 |
-
A more detailed description of the training recipe can be found in our [blog post](https://
|
72 |
|
73 |
## Evaluation
|
74 |
We report Pass@1 accuracy averaged over 16 samples for each problem.
|
|
|
20 |
</div>
|
21 |
<br>
|
22 |
<div align="center" style="line-height: 1;">
|
23 |
+
<a href="https://github.com/agentica-project/deepscaler" style="margin: 2px;">
|
24 |
<img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
25 |
</a>
|
26 |
<a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
|
27 |
<img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
28 |
</a>
|
29 |
+
<a href="https://x.com/Agentica_/status/1889006266661617779" style="margin: 2px;">
|
30 |
+
<img alt="X.ai" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
|
31 |
</a>
|
32 |
+
<a href="https://huggingface.co/agentica-org" style="margin: 2px;">
|
33 |
<img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
|
34 |
</a>
|
35 |
</div>
|
|
|
68 |
- Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
|
69 |
- Significant improvements within <200 steps
|
70 |
|
71 |
+
A more detailed description of the training recipe can be found in our [blog post](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2).
|
72 |
|
73 |
## Evaluation
|
74 |
We report Pass@1 accuracy averaged over 16 samples for each problem.
|