Update README.md
Browse files
README.md
CHANGED
@@ -124,19 +124,19 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hq
|
|
124 |
| mmlu (0-shot) | | 63.56 |
|
125 |
| mmlu_pro (5-shot) | | 36.74 |
|
126 |
| **Reasoning** | | |
|
127 |
-
| arc_challenge (0-shot) |
|
128 |
-
| gpqa_main_zeroshot |
|
129 |
| HellaSwag | 54.57 | 53.54 |
|
130 |
-
| openbookqa |
|
131 |
-
| piqa (0-shot) |
|
132 |
-
| social_iqa |
|
133 |
-
| truthfulqa_mc2 (0-shot) |
|
134 |
-
| winogrande (0-shot) |
|
135 |
| **Multilingual** | | |
|
136 |
-
| mgsm_en_cot_en |
|
137 |
| **Math** | | |
|
138 |
-
| gsm8k (5-shot) |
|
139 |
-
| mathqa (0-shot) |
|
140 |
| **Overall** | **TODO** | **TODO** |
|
141 |
|
142 |
|
@@ -164,7 +164,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
|
|
164 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
165 |
|
166 |
|
167 |
-
| Benchmark (Memory)
|
168 |
|----------------------------------|----------------|--------------------------|
|
169 |
| | Phi-4 mini-Ins | phi4-mini-int4wo-hqq |
|
170 |
| latency (batch_size=1) | 2.46s | 2.2s (12% speedup) |
|
|
|
124 |
| mmlu (0-shot) | | 63.56 |
|
125 |
| mmlu_pro (5-shot) | | 36.74 |
|
126 |
| **Reasoning** | | |
|
127 |
+
| arc_challenge (0-shot) | 56.91 | 54.86 |
|
128 |
+
| gpqa_main_zeroshot | 30.13 | 30.58 |
|
129 |
| HellaSwag | 54.57 | 53.54 |
|
130 |
+
| openbookqa | 33.00 | 34.40 |
|
131 |
+
| piqa (0-shot) | 77.64 | 76.33 |
|
132 |
+
| social_iqa | 49.59 | 47.90 |
|
133 |
+
| truthfulqa_mc2 (0-shot) | 48.39 | 46.44 |
|
134 |
+
| winogrande (0-shot) | 71.11 | 71.51 |
|
135 |
| **Multilingual** | | |
|
136 |
+
| mgsm_en_cot_en | 60.8 | 59.6 |
|
137 |
| **Math** | | |
|
138 |
+
| gsm8k (5-shot) | 81.88 | 74.37 |
|
139 |
+
| mathqa (0-shot) | 42.31 | 42.75 |
|
140 |
| **Overall** | **TODO** | **TODO** |
|
141 |
|
142 |
|
|
|
164 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
165 |
|
166 |
|
167 |
+
| Benchmark (Memory, TODO) | | |
|
168 |
|----------------------------------|----------------|--------------------------|
|
169 |
| | Phi-4 mini-Ins | phi4-mini-int4wo-hqq |
|
170 |
| latency (batch_size=1) | 2.46s | 2.2s (12% speedup) |
|