nm-research commited on
Commit
b87f998
·
verified ·
1 Parent(s): 1b056b2

Add reasoning evals

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -149,6 +149,31 @@ lm_eval \
149
  </thead>
150
  <tbody>
151
  <tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  <td rowspan="7"><b>OpenLLM V1</b></td>
153
  <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
154
  <td>37.20</td>
 
149
  </thead>
150
  <tbody>
151
  <tr>
152
+ <td rowspan="4"><b>Reasoning</b></td>
153
+ <td>AIME 2024 (pass@1)</td>
154
+ <td>30.05</td>
155
+ <td>29.83</td>
156
+ <td>99.27%</td>
157
+ </tr>
158
+ <tr>
159
+ <td>MATH-500 (pass@1)</td>
160
+ <td>84.66</td>
161
+ <td>84.74</td>
162
+ <td>100.09%</td>
163
+ </tr>
164
+ <tr>
165
+ <td>GPQA Diamond (pass@1)</td>
166
+ <td>35.37</td>
167
+ <td>35.93</td>
168
+ <td>101.58%</td>
169
+ </tr>
170
+ <tr>
171
+ <td><b>Average Score</b></td>
172
+ <td><b>50.03</b></td>
173
+ <td><b>50.17</b></td>
174
+ <td><b>100.28%</b></td>
175
+ </tr>
176
+ <tr>
177
  <td rowspan="7"><b>OpenLLM V1</b></td>
178
  <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
179
  <td>37.20</td>