noneUsername/huihui-ai-phi-4-abliterated-W8A8

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.928	±	0.0164

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.922	±	0.012
		strict-match	5	exact_match	↑	0.922	±	0.012

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.936	±	0.0155
		strict-match	5	exact_match	↑	0.936	±	0.0155

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0116
		strict-match	5	exact_match	↑	0.928	±	0.0116

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-869,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.92	±	0.0172
		strict-match	5	exact_match	↑	0.92	±	0.0172

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.918	±	0.0123
		strict-match	5	exact_match	↑	0.916	±	0.0124

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.928	±	0.0164

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.93	±	0.0114
		strict-match	5	exact_match	↑	0.93	±	0.0114

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7649	±	0.0137
- humanities	2	none	acc	↑	0.8103	±	0.0256
- other	2	none	acc	↑	0.7487	±	0.0287
- social sciences	2	none	acc	↑	0.8167	±	0.0280
- stem	2	none	acc	↑	0.7123	±	0.0260

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.944	±	0.0146
		strict-match	5	exact_match	↑	0.944	±	0.0146

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.924	±	0.0119
		strict-match	5	exact_match	↑	0.924	±	0.0119

vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7673	±	0.0136
- humanities	2	none	acc	↑	0.8154	±	0.0256
- other	2	none	acc	↑	0.7744	±	0.0285
- social sciences	2	none	acc	↑	0.8278	±	0.0273
- stem	2	none	acc	↑	0.6912	±	0.0263

noneUsername
/

huihui-ai-phi-4-abliterated-W8A8

Model tree for noneUsername/huihui-ai-phi-4-abliterated-W8A8