Lin-K76 commited on
Commit
9f6bfab
·
verified ·
1 Parent(s): 3fe49e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
- It achieves an average score of 72.46 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.11.
29
 
30
  ### Model Optimizations
31
 
@@ -130,7 +130,7 @@ oneshot(
130
  ## Evaluation
131
 
132
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
133
- A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
134
  ```
135
  lm_eval \
136
  --model vllm \
@@ -174,13 +174,13 @@ lm_eval \
174
  </td>
175
  </tr>
176
  <tr>
177
- <td>GSM-8K (5-shot, strict-match)
178
  </td>
179
- <td>75.66
180
  </td>
181
- <td>74.22
182
  </td>
183
- <td>98.10%
184
  </td>
185
  </tr>
186
  <tr>
@@ -216,11 +216,11 @@ lm_eval \
216
  <tr>
217
  <td><strong>Average</strong>
218
  </td>
219
- <td><strong>73.11</strong>
220
  </td>
221
- <td><strong>72.46</strong>
222
  </td>
223
- <td><strong>99.10%</strong>
224
  </td>
225
  </tr>
226
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
+ It achieves an average score of 73.81 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.17.
29
 
30
  ### Model Optimizations
31
 
 
130
  ## Evaluation
131
 
132
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
133
+ A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
134
  ```
135
  lm_eval \
136
  --model vllm \
 
174
  </td>
175
  </tr>
176
  <tr>
177
+ <td>GSM-8K-cot (8-shot, strict-match)
178
  </td>
179
+ <td>82.03
180
  </td>
181
+ <td>82.34
182
  </td>
183
+ <td>100.3%
184
  </td>
185
  </tr>
186
  <tr>
 
216
  <tr>
217
  <td><strong>Average</strong>
218
  </td>
219
+ <td><strong>74.17</strong>
220
  </td>
221
+ <td><strong>73.81</strong>
222
  </td>
223
+ <td><strong>99.48%</strong>
224
  </td>
225
  </tr>
226
  </table>