VanWang
/

DeepSeek-R1-Distill-Qwen-7B-ThinkPO

Model card Files Files and versions Community

VanWang commited on Feb 21

Commit

74c97a3

·

verified ·

1 Parent(s): d93aedb

Update README.md

Files changed (1) hide show

README.md +39 -3

README.md CHANGED Viewed

@@ -1,3 +1,39 @@
----
-license: mit
----

+---
+license: mit
+---
+> a simple yet effective postSFT method that enhances long CoT reasoning without requiring new long CoT responses
+>
+# Introduction
+- Here, we show the results of open-source reasoning LLMs before and after ThinkPO.
+## Accuracy
+| Models | Dataset   | SFT | Ours (+ThinkPO) | Improv. (%) |
+|:--------:|:--------:|:--------:|:--------:|:--------:|
+|DeepSeek-R1-Distill-Qwen-7B (Deepseek)  |MATH500   | 87.4          | 91.2           | 4.3%        |
+|| AIME      | 56.7      | 43.3           | -23.6%     |
+|| GPQA      | 47.0          | 49.5           | 5.3%        |
+|| GSM8K     | 87.2          | 87.6           | 0.5%        |
+|| Olympiad  | 58.6          | 58.6           | 0.0%        |
+|Bespoke-Stratos-7B (Bespoke)| MATH500   | 84.0         | 82.8           | -1.4%       |
+|| AIME      | 20.0         | 23.3           | 16.5%       |
+|| GPQA      | 37.9         | 43.4           | 14.5%       |
+|| GSM8K     | 92.9         | 93.3           | 0.4%        |
+|| Olympiad  | 44.1         | 48.5           | 10.0%       |
+## Average Response Length
+| Model | Dataset   | SFT | Ours (+ThinkPO) | Improv. (%) |
+|:--------:|:--------:|:--------:|:--------:|:--------:|
+|DeepSeek-R1-Distill-Qwen-7B (Deepseek) | MATH500   | 2577          | 3021           | 17.2%       |
+|| AIME      | 11419         | 12875          | 12.8%       |
+|| GPQA      | 4895          | 5604           | 14.5%       |
+|| GSM8K     | 619           | 668            | 7.9%        |
+|| Olympiad  | 7196          | 7383           | 2.6%        |
+|Bespoke-Stratos-7B (Bespoke)| MATH500   | 5696         | 6404           | 12.4%       |
+|| AIME      | 19858        | 20079          | 1.1%        |
+|| GPQA      | 5968         | 7301           | 22.3%       |
+|| GSM8K     | 1404         | 1755           | 25.0%       |
+|| Olympiad  | 11140        | 12204          | 9.6%        |
+---