jerryzh168 commited on
Commit
059d669
·
verified ·
1 Parent(s): 2ab02aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -124,19 +124,19 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hq
124
  | mmlu (0-shot) | | 63.56 |
125
  | mmlu_pro (5-shot) | | 36.74 |
126
  | **Reasoning** | | |
127
- | arc_challenge (0-shot) | | 54.86 |
128
- | gpqa_main_zeroshot | | 30.58 |
129
  | HellaSwag | 54.57 | 53.54 |
130
- | openbookqa | | 34.40 |
131
- | piqa (0-shot) | | 76.33 |
132
- | social_iqa | | 47.90 |
133
- | truthfulqa_mc2 (0-shot) | | 46.44 |
134
- | winogrande (0-shot) | | 71.51 |
135
  | **Multilingual** | | |
136
- | mgsm_en_cot_en | | 59.6 |
137
  | **Math** | | |
138
- | gsm8k (5-shot) | | 74.37 |
139
- | mathqa (0-shot) | | 42.75 |
140
  | **Overall** | **TODO** | **TODO** |
141
 
142
 
@@ -164,7 +164,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
164
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
165
 
166
 
167
- | Benchmark (Memory) | | |
168
  |----------------------------------|----------------|--------------------------|
169
  | | Phi-4 mini-Ins | phi4-mini-int4wo-hqq |
170
  | latency (batch_size=1) | 2.46s | 2.2s (12% speedup) |
 
124
  | mmlu (0-shot) | | 63.56 |
125
  | mmlu_pro (5-shot) | | 36.74 |
126
  | **Reasoning** | | |
127
+ | arc_challenge (0-shot) | 56.91 | 54.86 |
128
+ | gpqa_main_zeroshot | 30.13 | 30.58 |
129
  | HellaSwag | 54.57 | 53.54 |
130
+ | openbookqa | 33.00 | 34.40 |
131
+ | piqa (0-shot) | 77.64 | 76.33 |
132
+ | social_iqa | 49.59 | 47.90 |
133
+ | truthfulqa_mc2 (0-shot) | 48.39 | 46.44 |
134
+ | winogrande (0-shot) | 71.11 | 71.51 |
135
  | **Multilingual** | | |
136
+ | mgsm_en_cot_en | 60.8 | 59.6 |
137
  | **Math** | | |
138
+ | gsm8k (5-shot) | 81.88 | 74.37 |
139
+ | mathqa (0-shot) | 42.31 | 42.75 |
140
  | **Overall** | **TODO** | **TODO** |
141
 
142
 
 
164
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
165
 
166
 
167
+ | Benchmark (Memory, TODO) | | |
168
  |----------------------------------|----------------|--------------------------|
169
  | | Phi-4 mini-Ins | phi4-mini-int4wo-hqq |
170
  | latency (batch_size=1) | 2.46s | 2.2s (12% speedup) |