noneUsername/phi-4-abliterated-W8A8

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.932	±	0.016
		strict-match	5	exact_match	↑	0.932	±	0.016

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.922	±	0.012
		strict-match	5	exact_match	↑	0.922	±	0.012

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-85,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.92	±	0.0172
		strict-match	5	exact_match	↑	0.92	±	0.0172

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-85,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.918	±	0.0123
		strict-match	5	exact_match	↑	0.918	±	0.0123

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.932	±	0.016
		strict-match	5	exact_match	↑	0.932	±	0.016

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.934	±	0.0111
		strict-match	5	exact_match	↑	0.934	±	0.0111

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-875,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.924	±	0.0168
		strict-match	5	exact_match	↑	0.924	±	0.0168

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.916	±	0.0124
		strict-match	5	exact_match	↑	0.916	±	0.0124

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.928	±	0.0164

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.93	±	0.0114
		strict-match	5	exact_match	↑	0.93	±	0.0114

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7649	±	0.0137
- humanities	2	none	acc	↑	0.8103	±	0.0256
- other	2	none	acc	↑	0.7487	±	0.0287
- social sciences	2	none	acc	↑	0.8167	±	0.0280
- stem	2	none	acc	↑	0.7123	±	0.0260

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.928	±	0.0164

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.916	±	0.0124
		strict-match	5	exact_match	↑	0.916	±	0.0124

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7696	±	0.0136
- humanities	2	none	acc	↑	0.8000	±	0.0261
- other	2	none	acc	↑	0.7692	±	0.0280
- social sciences	2	none	acc	↑	0.8389	±	0.0265
- stem	2	none	acc	↑	0.7053	±	0.0265

noneUsername
/

phi-4-abliterated-W8A8

Model tree for noneUsername/phi-4-abliterated-W8A8