michaelzhiluo commited on
Commit
24a92ef
·
verified ·
1 Parent(s): 78da871

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -20,16 +20,16 @@ base_model:
20
  </div>
21
  <br>
22
  <div align="center" style="line-height: 1;">
23
- <a href="https://github.com/agentica-project/deepscaler" target="_blank" style="margin: 2px;">
24
  <img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
25
  </a>
26
  <a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
27
  <img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
28
  </a>
29
- <a href="https://x.com/Agentica_" target="_blank" style="margin: 2px;">
30
- <img alt="Blog" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
31
  </a>
32
- <a href="https://huggingface.co/agentica-org" target="_blank" style="margin: 2px;">
33
  <img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
34
  </a>
35
  </div>
@@ -68,7 +68,7 @@ We employ Deepseek's Group Relative Policy Optimization (GRPO), a simplified RL
68
  - Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
69
  - Significant improvements within <200 steps
70
 
71
- A more detailed description of the training recipe can be found in our [blog post](https://www.notion.so/DeepScaleR-Scaling-R1-Models-with-Reinforcement-Learning-1891e65ddc7f80ad8cc6dbe0069a66fa?pvs=4).
72
 
73
  ## Evaluation
74
  We report Pass@1 accuracy averaged over 16 samples for each problem.
 
20
  </div>
21
  <br>
22
  <div align="center" style="line-height: 1;">
23
+ <a href="https://github.com/agentica-project/deepscaler" style="margin: 2px;">
24
  <img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
25
  </a>
26
  <a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
27
  <img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
28
  </a>
29
+ <a href="https://x.com/Agentica_/status/1889006266661617779" style="margin: 2px;">
30
+ <img alt="X.ai" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
31
  </a>
32
+ <a href="https://huggingface.co/agentica-org" style="margin: 2px;">
33
  <img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
34
  </a>
35
  </div>
 
68
  - Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
69
  - Significant improvements within <200 steps
70
 
71
+ A more detailed description of the training recipe can be found in our [blog post](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2).
72
 
73
  ## Evaluation
74
  We report Pass@1 accuracy averaged over 16 samples for each problem.