Commit
·
87002f7
1
Parent(s):
1737ca6
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,81 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
library_name: transformers
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
datasets:
|
5 |
+
- AI-MO/NuminaMath-CoT
|
6 |
+
- KbsdJames/Omni-MATH
|
7 |
+
- RUC-AIBOX/STILL-3-Preview-RL-Data
|
8 |
+
- hendrycks/competition_math
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
base_model:
|
12 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
13 |
+
---
|
14 |
+
|
15 |
+
<div align="center">
|
16 |
+
<span style="font-family: default; font-size: 1.5em;">DeepScaleR-1.5B-Preview</span>
|
17 |
+
<div>
|
18 |
+
🚀 Democratizing Reinforcement Learning for LLMs 🌟
|
19 |
+
</div>
|
20 |
+
</div>
|
21 |
+
|
22 |
+
<div align="center" style="line-height: 1;">
|
23 |
+
<a href="https://github.com/agentica-project/deepscaler" target="_blank" style="margin: 2px;">
|
24 |
+
<img alt="Code" src="https://img.shields.io/badge/DeepScaleR-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
25 |
+
</a>
|
26 |
+
<a href="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" target="_blank" style="margin: 2px;">
|
27 |
+
<img alt="Blog" src="https://www.notion.so/DeepScaleR-Scaling-R1-Models-with-Reinforcement-Learning-1891e65ddc7f80ad8cc6dbe0069a66fa?pvs=4" style="display: inline-block; vertical-align: middle;"/>
|
28 |
+
</a>
|
29 |
+
<a href="https://huggingface.co/agentica-org" target="_blank" style="margin: 2px;">
|
30 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
|
31 |
+
</a>
|
32 |
+
</div>
|
33 |
+
|
34 |
+
</div>
|
35 |
+
|
36 |
+
</div>
|
37 |
+
|
38 |
+
## Model Details
|
39 |
+
|
40 |
+
### Model Description
|
41 |
+
|
42 |
+
|
43 |
+
## Training Details
|
44 |
+
|
45 |
+
### Training Data
|
46 |
+
|
47 |
+
|
48 |
+
### Training Procedure
|
49 |
+
|
50 |
+
## Evaluation
|
51 |
+
|
52 |
+
We report Pass@1 accuracy averaged over 16 samples for each problem.
|
53 |
+
| Model | AIME 2024 | MATH 500 | AMC 2023 | Minerva Math | OlympiadBench | Avg. |
|
54 |
+
|-------|-----------|-----------|-----------|--------------|---------------|------|
|
55 |
+
| 2.5-7B-Instruct | 13.3 | 79.8 | 50.6 | 34.6 | 40.7 | 43.8 |
|
56 |
+
| rStar-Math-7B | 26.7 | 78.4 | 47.5 | - | 47.1 | - |
|
57 |
+
| Eurus-2-7B-PRIME | 26.7 | 79.2 | 57.8 | 38.6 | 42.1 | 48.9 |
|
58 |
+
| Qwen2.5-7B-SimpleRL | 26.7 | 82.4 | 62.5 | <strong>39.7</strong> | 43.3 | 50.9 |
|
59 |
+
| DeepSeek-R1-Distill-Qwen-1.5B | 28.8 | 82.8 | 62.9 | 26.5 | 43.3 | 48.9 |
|
60 |
+
| Still-1.5B | 32.5 | 84.4 | 66.7 | 29.0 | 45.4 | 51.6 |
|
61 |
+
| <strong>DeepScaleR-1.5B-Preview</strong> | <strong>43.1</strong> | <strong>87.8</strong> | <strong>73.6</strong> | 30.2 | - | - |
|
62 |
+
| O1-Preview | 40.0 | 81.4 | - | - | - | - |
|
63 |
+
|
64 |
+
## Acknowledgement
|
65 |
+
|
66 |
+
- Our training experiments are powered by our heavily modified fork of [Verl](https://github.com/agentica-project/verl), an open-source RLHF library.
|
67 |
+
- Our model is trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
|
68 |
+
- Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/) and [Berkeley AI Research](https://bair.berkeley.edu/).
|
69 |
+
|
70 |
+
|
71 |
+
## Citation
|
72 |
+
|
73 |
+
```bibtex
|
74 |
+
@misc{deepscaler2025,
|
75 |
+
title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},
|
76 |
+
author={Michael Luo and Sijun Tan and Justin Wong and Xiaoxiang Shi and William Tang and Manan Roongta and Colin Cai and Jeffrey Luo and Tianjun Zhang and Erran Li and Raluca Ada Popa and Ion Stoica},
|
77 |
+
year={2025},
|
78 |
+
howpublished={\url{}},
|
79 |
+
note={Notion Blog}
|
80 |
+
year={2025}
|
81 |
+
}
|