Update README.md
Browse files
README.md
CHANGED
@@ -2,8 +2,33 @@
|
|
2 |
tags:
|
3 |
- fp8
|
4 |
---
|
|
|
|
|
|
|
|
|
5 |
Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py
|
6 |
|
7 |
```bash
|
8 |
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
```
|
|
|
2 |
tags:
|
3 |
- fp8
|
4 |
---
|
5 |
+
|
6 |
+
|
7 |
+
Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.4.2.
|
8 |
+
|
9 |
Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py
|
10 |
|
11 |
```bash
|
12 |
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
|
13 |
+
```
|
14 |
+
|
15 |
+
Accuracy on MMLU:
|
16 |
+
```
|
17 |
+
vllm (pretrained=meta-llama/Meta-Llama-3-8B-Instruct,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|
18 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
19 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
20 |
+
|mmlu |N/A |none | 0|acc |0.6569|± |0.0038|
|
21 |
+
| - humanities |N/A |none | 5|acc |0.6049|± |0.0068|
|
22 |
+
| - other |N/A |none | 5|acc |0.7203|± |0.0078|
|
23 |
+
| - social_sciences|N/A |none | 5|acc |0.7663|± |0.0075|
|
24 |
+
| - stem |N/A |none | 5|acc |0.5652|± |0.0085|
|
25 |
+
|
26 |
+
vllm (pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8,quantization=fp8,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|
27 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
28 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
29 |
+
|mmlu |N/A |none | 0|acc |0.6567|± |0.0038|
|
30 |
+
| - humanities |N/A |none | 5|acc |0.6072|± |0.0068|
|
31 |
+
| - other |N/A |none | 5|acc |0.7206|± |0.0078|
|
32 |
+
| - social_sciences|N/A |none | 5|acc |0.7618|± |0.0075|
|
33 |
+
| - stem |N/A |none | 5|acc |0.5649|± |0.0085|
|
34 |
```
|