Update README.md

Browse files

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -124,19 +124,19 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hq
 | mmlu (0-shot)                    |                |  63.56              |
 | mmlu_pro (5-shot)                |                |  36.74              |
 | **Reasoning**                    |                |                     |
-| arc_challenge (0-shot)           |                |  54.86              |
-| gpqa_main_zeroshot               |                |  30.58              |
 | HellaSwag                        | 54.57          |  53.54              |
-| openbookqa                       |                |  34.40              |
-| piqa (0-shot)	                   |                |  76.33              |
-| social_iqa                       |                |  47.90              |
-| truthfulqa_mc2 (0-shot)          |                |  46.44              |
-| winogrande  (0-shot)             |                |  71.51              |
 | **Multilingual**                 |                |                     |
-| mgsm_en_cot_en                   |                |  59.6               |
 | **Math**                         |                |                     |
-| gsm8k (5-shot)                   |                |  74.37              |
-| mathqa (0-shot)                  |                |  42.75              |
 | **Overall**                      | **TODO**       | **TODO**            |
@@ -164,7 +164,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
-| Benchmark (Memory)               |                |                          |
 |----------------------------------|----------------|--------------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-int4wo-hqq     |
 | latency (batch_size=1)           | 2.46s          | 2.2s (12% speedup)       |

 | mmlu (0-shot)                    |                |  63.56              |
 | mmlu_pro (5-shot)                |                |  36.74              |
 | **Reasoning**                    |                |                     |
+| arc_challenge (0-shot)           | 56.91          |  54.86              |
+| gpqa_main_zeroshot               | 30.13          |  30.58              |
 | HellaSwag                        | 54.57          |  53.54              |
+| openbookqa                       | 33.00          |  34.40              |
+| piqa (0-shot)	                   | 77.64          |  76.33              |
+| social_iqa                       | 49.59          |  47.90              |
+| truthfulqa_mc2 (0-shot)          | 48.39          |  46.44              |
+| winogrande  (0-shot)             | 71.11          |  71.51              |
 | **Multilingual**                 |                |                     |
+| mgsm_en_cot_en                   | 60.8           |  59.6               |
 | **Math**                         |                |                     |
+| gsm8k (5-shot)                   | 81.88          |  74.37              |
+| mathqa (0-shot)                  | 42.31          |  42.75              |
 | **Overall**                      | **TODO**       | **TODO**            |
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
+| Benchmark (Memory, TODO)         |                |                          |
 |----------------------------------|----------------|--------------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-int4wo-hqq     |
 | latency (batch_size=1)           | 2.46s          | 2.2s (12% speedup)       |