alexmarques commited on
Commit
fc8f281
·
verified ·
1 Parent(s): 23e945d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -2
README.md CHANGED
@@ -200,7 +200,7 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge and
200
  </td>
201
  </tr>
202
  <tr>
203
- <td>TruthfulQA (0-shot)
204
  </td>
205
  <td>59.83
206
  </td>
@@ -219,4 +219,70 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge and
219
  <td><strong>98.8%</strong>
220
  </td>
221
  </tr>
222
- </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
  </td>
201
  </tr>
202
  <tr>
203
+ <td>TruthfulQA (0-shot, mc2)
204
  </td>
205
  <td>59.83
206
  </td>
 
219
  <td><strong>98.8%</strong>
220
  </td>
221
  </tr>
222
+ </table>
223
+
224
+ ### Reproduction
225
+
226
+ The results were obtained using the following commands:
227
+
228
+ #### MMLU
229
+ ```
230
+ lm_eval \
231
+ --model vllm \
232
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
233
+ --tasks mmlu \
234
+ --num_fewshot 5 \
235
+ --batch_size auto
236
+ ```
237
+
238
+ #### ARC-Challenge
239
+ ```
240
+ lm_eval \
241
+ --model vllm \
242
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
243
+ --tasks arc_challenge_llama_3.1_instruct \
244
+ --apply_chat_template \
245
+ --num_fewshot 0 \
246
+ --batch_size auto
247
+ ```
248
+
249
+ #### GSM-8K
250
+ ```
251
+ lm_eval \
252
+ --model vllm \
253
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
254
+ --tasks gsm8k_cot_llama_3.1_instruct \
255
+ --apply_chat_template \
256
+ --num_fewshot 8 \
257
+ --batch_size auto
258
+ ```
259
+
260
+ #### Hellaswag
261
+ ```
262
+ lm_eval \
263
+ --model vllm \
264
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
265
+ --tasks hellaswag \
266
+ --num_fewshot 10 \
267
+ --batch_size auto
268
+ ```
269
+
270
+ #### Winogrande
271
+ ```
272
+ lm_eval \
273
+ --model vllm \
274
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
275
+ --tasks winogrande \
276
+ --num_fewshot 5 \
277
+ --batch_size auto
278
+ ```
279
+
280
+ #### TruthfulQA
281
+ ```
282
+ lm_eval \
283
+ --model vllm \
284
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4 \
285
+ --tasks truthfulqa \
286
+ --num_fewshot 0 \
287
+ --batch_size auto
288
+ ```