VanWang commited on
Commit
74c97a3
·
verified ·
1 Parent(s): d93aedb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ > a simple yet effective postSFT method that enhances long CoT reasoning without requiring new long CoT responses
6
+ >
7
+ # Introduction
8
+ - Here, we show the results of open-source reasoning LLMs before and after ThinkPO.
9
+ ## Accuracy
10
+
11
+ | Models | Dataset | SFT | Ours (+ThinkPO) | Improv. (%) |
12
+ |:--------:|:--------:|:--------:|:--------:|:--------:|
13
+ |DeepSeek-R1-Distill-Qwen-7B (Deepseek) |MATH500 | 87.4 | 91.2 | 4.3% |
14
+ || AIME | 56.7 | 43.3 | -23.6% |
15
+ || GPQA | 47.0 | 49.5 | 5.3% |
16
+ || GSM8K | 87.2 | 87.6 | 0.5% |
17
+ || Olympiad | 58.6 | 58.6 | 0.0% |
18
+ |Bespoke-Stratos-7B (Bespoke)| MATH500 | 84.0 | 82.8 | -1.4% |
19
+ || AIME | 20.0 | 23.3 | 16.5% |
20
+ || GPQA | 37.9 | 43.4 | 14.5% |
21
+ || GSM8K | 92.9 | 93.3 | 0.4% |
22
+ || Olympiad | 44.1 | 48.5 | 10.0% |
23
+
24
+ ## Average Response Length
25
+
26
+ | Model | Dataset | SFT | Ours (+ThinkPO) | Improv. (%) |
27
+ |:--------:|:--------:|:--------:|:--------:|:--------:|
28
+ |DeepSeek-R1-Distill-Qwen-7B (Deepseek) | MATH500 | 2577 | 3021 | 17.2% |
29
+ || AIME | 11419 | 12875 | 12.8% |
30
+ || GPQA | 4895 | 5604 | 14.5% |
31
+ || GSM8K | 619 | 668 | 7.9% |
32
+ || Olympiad | 7196 | 7383 | 2.6% |
33
+ |Bespoke-Stratos-7B (Bespoke)| MATH500 | 5696 | 6404 | 12.4% |
34
+ || AIME | 19858 | 20079 | 1.1% |
35
+ || GPQA | 5968 | 7301 | 22.3% |
36
+ || GSM8K | 1404 | 1755 | 25.0% |
37
+ || Olympiad | 11140 | 12204 | 9.6% |
38
+
39
+ ---