agentica-org
/

DeepScaleR-1.5B-Preview

Transformers

Safetensors

English

Inference Endpoints

Model card Files Files and versions Community

michaelzhiluo commited on about 23 hours ago

Commit

24a92ef

verified ·

1 Parent(s): 78da871

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -20,16 +20,16 @@ base_model:
 </div>
 <br>
 <div align="center" style="line-height: 1;">
-  <a href="https://github.com/agentica-project/deepscaler" target="_blank" style="margin: 2px;">
     <img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
     <img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
-  <a href="https://x.com/Agentica_" target="_blank" style="margin: 2px;">
-    <img alt="Blog" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
-  <a href="https://huggingface.co/agentica-org" target="_blank" style="margin: 2px;">
     <img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
   </a>
 </div>
@@ -68,7 +68,7 @@ We employ Deepseek's Group Relative Policy Optimization (GRPO), a simplified RL
     - Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
     - Significant improvements within <200 steps
-A more detailed description of the training recipe can be found in our [blog post](https://www.notion.so/DeepScaleR-Scaling-R1-Models-with-Reinforcement-Learning-1891e65ddc7f80ad8cc6dbe0069a66fa?pvs=4).
 ## Evaluation
 We report Pass@1 accuracy averaged over 16 samples for each problem.

 </div>
 <br>
 <div align="center" style="line-height: 1;">
+  <a href="https://github.com/agentica-project/deepscaler" style="margin: 2px;">
     <img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2" target="_blank" style="margin: 2px;">
     <img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
+  <a href="https://x.com/Agentica_/status/1889006266661617779" style="margin: 2px;">
+    <img alt="X.ai" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
   </a>
+  <a href="https://huggingface.co/agentica-org" style="margin: 2px;">
     <img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
   </a>
 </div>
     - Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
     - Significant improvements within <200 steps
+A more detailed description of the training recipe can be found in our [blog post](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2).
 ## Evaluation
 We report Pass@1 accuracy averaged over 16 samples for each problem.