Update README.md
Browse files
README.md
CHANGED
|
@@ -125,7 +125,43 @@ tokenizer.push_to_hub(save_to)
|
|
| 125 |
```
|
| 126 |
|
| 127 |
# Model Quality
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
# Memory Usage
|
| 131 |
|
|
@@ -135,7 +171,7 @@ TODO
|
|
| 135 |
| Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
|
| 136 |
|
| 137 |
<details>
|
| 138 |
-
<summary> Reproduce
|
| 139 |
|
| 140 |
Code
|
| 141 |
```Py
|
|
|
|
| 125 |
```
|
| 126 |
|
| 127 |
# Model Quality
|
| 128 |
+
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
| 129 |
+
|
| 130 |
+
| Benchmark | | |
|
| 131 |
+
|----------------------------------|----------------|---------------------------|
|
| 132 |
+
| | Qwen3-8B | Qwen3-8B-int4wo |
|
| 133 |
+
| **General** | | |
|
| 134 |
+
| mmlu | 73.04 | 70.4 |
|
| 135 |
+
| mmlu_pro | 53.81 | 52.79 |
|
| 136 |
+
| bbh | 79.33 | 74.92 |
|
| 137 |
+
| **Multilingual** | | |
|
| 138 |
+
| mgsm_en_cot_en | 39.6 | 33.2 |
|
| 139 |
+
| m_mmlu (avg) | 57.17 | 54.06 |
|
| 140 |
+
| **Math** | | |
|
| 141 |
+
| gpqa_main_zeroshot | 35.71 | 32.14 |
|
| 142 |
+
| gsm8k | 87.79 | 86.28 |
|
| 143 |
+
| leaderboard_math_hard (v3) | 53.7 | 46.83 |
|
| 144 |
+
| **Overall** | 60.02 | 56.33 |
|
| 145 |
+
|
| 146 |
+
<details>
|
| 147 |
+
<summary> Reproduce Model Quality Results </summary>
|
| 148 |
+
|
| 149 |
+
Need to install lm-eval from source:
|
| 150 |
+
https://github.com/EleutherAI/lm-evaluation-harness#install
|
| 151 |
+
|
| 152 |
+
## baseline
|
| 153 |
+
```Shell
|
| 154 |
+
lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks mmlu --device cuda:0 --batch_size 8
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
## float8 dynamic quantization (float8dq)
|
| 158 |
+
```Shell
|
| 159 |
+
export MODEL=pytorch/Qwen3-32B-float8dq
|
| 160 |
+
# or
|
| 161 |
+
# export MODEL=Qwen/Qwen3-32B
|
| 162 |
+
lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 --batch_size 8
|
| 163 |
+
```
|
| 164 |
+
</details>
|
| 165 |
|
| 166 |
# Memory Usage
|
| 167 |
|
|
|
|
| 171 |
| Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
|
| 172 |
|
| 173 |
<details>
|
| 174 |
+
<summary> Reproduce Peak Memory Usage Results </summary>
|
| 175 |
|
| 176 |
Code
|
| 177 |
```Py
|