Lin-K76 commited on
Commit
5b64234
·
verified ·
1 Parent(s): 0cca697

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
- It achieves an average score of 78.69 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 78.67.
29
 
30
  ### Model Optimizations
31
 
@@ -131,6 +131,7 @@ oneshot(
131
  ## Evaluation
132
 
133
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
 
134
  ```
135
  lm_eval \
136
  --model vllm \
@@ -166,11 +167,11 @@ lm_eval \
166
  <tr>
167
  <td>ARC Challenge (25-shot)
168
  </td>
169
- <td>70.65
170
  </td>
171
- <td>70.31
172
  </td>
173
- <td>99.52%
174
  </td>
175
  </tr>
176
  <tr>
@@ -216,9 +217,9 @@ lm_eval \
216
  <tr>
217
  <td><strong>Average</strong>
218
  </td>
219
- <td><strong>78.67</strong>
220
  </td>
221
- <td><strong>78.69</strong>
222
  </td>
223
  <td><strong>100.0%</strong>
224
  </td>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
+ It achieves an average score of 82.78 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 82.74.
29
 
30
  ### Model Optimizations
31
 
 
131
  ## Evaluation
132
 
133
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
134
+ A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
135
  ```
136
  lm_eval \
137
  --model vllm \
 
167
  <tr>
168
  <td>ARC Challenge (25-shot)
169
  </td>
170
+ <td>95.05
171
  </td>
172
+ <td>94.88
173
  </td>
174
+ <td>99.82%
175
  </td>
176
  </tr>
177
  <tr>
 
217
  <tr>
218
  <td><strong>Average</strong>
219
  </td>
220
+ <td><strong>82.74</strong>
221
  </td>
222
+ <td><strong>82.78</strong>
223
  </td>
224
  <td><strong>100.0%</strong>
225
  </td>