Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
|
33 |
- **Model Developers:** Neural Magic
|
34 |
|
35 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
36 |
-
It achieves scores within 1
|
37 |
|
38 |
### Model Optimizations
|
39 |
|
@@ -135,6 +135,8 @@ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande an
|
|
135 |
Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
136 |
This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
|
137 |
|
|
|
|
|
138 |
### Accuracy
|
139 |
|
140 |
#### Open LLM Leaderboard evaluation scores
|
@@ -239,7 +241,7 @@ The results were obtained using the following commands:
|
|
239 |
```
|
240 |
lm_eval \
|
241 |
--model vllm \
|
242 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
243 |
--tasks mmlu_llama_3.1_instruct \
|
244 |
--fewshot_as_multiturn \
|
245 |
--apply_chat_template \
|
@@ -251,7 +253,7 @@ lm_eval \
|
|
251 |
```
|
252 |
lm_eval \
|
253 |
--model vllm \
|
254 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
255 |
--tasks mmlu_cot_0shot_llama_3.1_instruct \
|
256 |
--apply_chat_template \
|
257 |
--num_fewshot 0 \
|
@@ -262,7 +264,7 @@ lm_eval \
|
|
262 |
```
|
263 |
lm_eval \
|
264 |
--model vllm \
|
265 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
266 |
--tasks arc_challenge_llama_3.1_instruct \
|
267 |
--apply_chat_template \
|
268 |
--num_fewshot 0 \
|
@@ -273,7 +275,7 @@ lm_eval \
|
|
273 |
```
|
274 |
lm_eval \
|
275 |
--model vllm \
|
276 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
277 |
--tasks gsm8k_cot_llama_3.1_instruct \
|
278 |
--fewshot_as_multiturn \
|
279 |
--apply_chat_template \
|
|
|
33 |
- **Model Developers:** Neural Magic
|
34 |
|
35 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
36 |
+
It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
|
37 |
|
38 |
### Model Optimizations
|
39 |
|
|
|
135 |
Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
136 |
This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
|
137 |
|
138 |
+
**Note:** Results have been updated after Meta modified the chat template.
|
139 |
+
|
140 |
### Accuracy
|
141 |
|
142 |
#### Open LLM Leaderboard evaluation scores
|
|
|
241 |
```
|
242 |
lm_eval \
|
243 |
--model vllm \
|
244 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3850,max_gen_toks=10,tensor_parallel_size=1 \
|
245 |
--tasks mmlu_llama_3.1_instruct \
|
246 |
--fewshot_as_multiturn \
|
247 |
--apply_chat_template \
|
|
|
253 |
```
|
254 |
lm_eval \
|
255 |
--model vllm \
|
256 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4064,max_gen_toks=1024,tensor_parallel_size=1 \
|
257 |
--tasks mmlu_cot_0shot_llama_3.1_instruct \
|
258 |
--apply_chat_template \
|
259 |
--num_fewshot 0 \
|
|
|
264 |
```
|
265 |
lm_eval \
|
266 |
--model vllm \
|
267 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3940,max_gen_toks=100,tensor_parallel_size=1 \
|
268 |
--tasks arc_challenge_llama_3.1_instruct \
|
269 |
--apply_chat_template \
|
270 |
--num_fewshot 0 \
|
|
|
275 |
```
|
276 |
lm_eval \
|
277 |
--model vllm \
|
278 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4096,max_gen_toks=1024,tensor_parallel_size=1 \
|
279 |
--tasks gsm8k_cot_llama_3.1_instruct \
|
280 |
--fewshot_as_multiturn \
|
281 |
--apply_chat_template \
|