vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.928 ± 0.0164
strict-match 5 exact_match ↑ 0.928 ± 0.0164

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.922 ± 0.012
strict-match 5 exact_match ↑ 0.922 ± 0.012

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.936 ± 0.0155
strict-match 5 exact_match ↑ 0.936 ± 0.0155

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.928 ± 0.0116
strict-match 5 exact_match ↑ 0.928 ± 0.0116

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-869,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.92 ± 0.0172
strict-match 5 exact_match ↑ 0.92 ± 0.0172

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-869,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.918 ± 0.0123
strict-match 5 exact_match ↑ 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.928 ± 0.0164
strict-match 5 exact_match ↑ 0.928 ± 0.0164

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.93 ± 0.0114
strict-match 5 exact_match ↑ 0.93 ± 0.0114

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7649 ± 0.0137
- humanities 2 none acc ↑ 0.8103 ± 0.0256
- other 2 none acc ↑ 0.7487 ± 0.0287
- social sciences 2 none acc ↑ 0.8167 ± 0.0280
- stem 2 none acc ↑ 0.7123 ± 0.0260

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.944 ± 0.0146
strict-match 5 exact_match ↑ 0.944 ± 0.0146

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.924 ± 0.0119
strict-match 5 exact_match ↑ 0.924 ± 0.0119

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7673 ± 0.0136
- humanities 2 none acc ↑ 0.8154 ± 0.0256
- other 2 none acc ↑ 0.7744 ± 0.0285
- social sciences 2 none acc ↑ 0.8278 ± 0.0273
- stem 2 none acc ↑ 0.6912 ± 0.0263
Downloads last month
17
Safetensors
Model size
14.7B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for noneUsername/huihui-ai-phi-4-abliterated-W8A8

Base model

microsoft/phi-4
Quantized
(15)
this model