Update README.md
Browse files
README.md
CHANGED
|
@@ -192,27 +192,27 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
|
|
| 192 |
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
|
| 193 |
```
|
| 194 |
|
| 195 |
-
| Benchmark | |
|
| 196 |
-
|
| 197 |
-
| | Phi-4
|
| 198 |
-
| **Popular aggregated benchmark** | |
|
| 199 |
-
| mmlu (0-shot) | 66.73 | 63.56
|
| 200 |
-
| mmlu_pro (5-shot) | 46.43 | 36.74
|
| 201 |
-
| **Reasoning** | |
|
| 202 |
-
| arc_challenge (0-shot) | 56.91 | 54.86
|
| 203 |
-
| gpqa_main_zeroshot | 30.13 | 30.58
|
| 204 |
-
| HellaSwag | 54.57 | 53.54
|
| 205 |
-
| openbookqa | 33.00 | 34.40
|
| 206 |
-
| piqa (0-shot) | 77.64 | 76.33
|
| 207 |
-
| social_iqa | 49.59 | 47.90
|
| 208 |
-
| truthfulqa_mc2 (0-shot) | 48.39 | 46.44
|
| 209 |
-
| winogrande (0-shot) | 71.11 | 71.51
|
| 210 |
-
| **Multilingual** | |
|
| 211 |
-
| mgsm_en_cot_en | 60.8 | 59.6
|
| 212 |
-
| **Math** | |
|
| 213 |
-
| gsm8k (5-shot) | 81.88 | 74.37
|
| 214 |
-
| mathqa (0-shot) | 42.31 | 42.75
|
| 215 |
-
| **Overall** | **55.35** | **53.28**
|
| 216 |
|
| 217 |
|
| 218 |
# Peak Memory Usage
|
|
|
|
| 192 |
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
|
| 193 |
```
|
| 194 |
|
| 195 |
+
| Benchmark | | |
|
| 196 |
+
|----------------------------------|----------------|---------------------------|
|
| 197 |
+
| | Phi-4-mini-ins | Phi-4-mini-ins-int4wo-hqq |
|
| 198 |
+
| **Popular aggregated benchmark** | | |
|
| 199 |
+
| mmlu (0-shot) | 66.73 | 63.56 |
|
| 200 |
+
| mmlu_pro (5-shot) | 46.43 | 36.74 |
|
| 201 |
+
| **Reasoning** | | |
|
| 202 |
+
| arc_challenge (0-shot) | 56.91 | 54.86 |
|
| 203 |
+
| gpqa_main_zeroshot | 30.13 | 30.58 |
|
| 204 |
+
| HellaSwag | 54.57 | 53.54 |
|
| 205 |
+
| openbookqa | 33.00 | 34.40 |
|
| 206 |
+
| piqa (0-shot) | 77.64 | 76.33 |
|
| 207 |
+
| social_iqa | 49.59 | 47.90 |
|
| 208 |
+
| truthfulqa_mc2 (0-shot) | 48.39 | 46.44 |
|
| 209 |
+
| winogrande (0-shot) | 71.11 | 71.51 |
|
| 210 |
+
| **Multilingual** | | |
|
| 211 |
+
| mgsm_en_cot_en | 60.8 | 59.6 |
|
| 212 |
+
| **Math** | | |
|
| 213 |
+
| gsm8k (5-shot) | 81.88 | 74.37 |
|
| 214 |
+
| mathqa (0-shot) | 42.31 | 42.75 |
|
| 215 |
+
| **Overall** | **55.35** | **53.28** |
|
| 216 |
|
| 217 |
|
| 218 |
# Peak Memory Usage
|