noneUsername/Wayfarer-12B-W8A8

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.620	±	0.0308
		strict-match	5	exact_match	↑	0.616	±	0.0308

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.604	±	0.0219
		strict-match	5	exact_match	↑	0.606	±	0.0219

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=1024,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 6.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6784	±	0.0237
- humanities	2	none	acc	↑	0.6923	±	0.0451
- other	2	none	acc	↑	0.6923	±	0.0538
- social sciences	2	none	acc	↑	0.7917	±	0.0444
- stem	2	none	acc	↑	0.5877	±	0.0442

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.660	±	0.0300
		strict-match	5	exact_match	↑	0.652	±	0.0302

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.632	±	0.0216
		strict-match	5	exact_match	↑	0.628	±	0.0216

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6566	±	0.0220
- humanities	2	none	acc	↑	0.6593	±	0.0444
- other	2	none	acc	↑	0.6703	±	0.0491
- social sciences	2	none	acc	↑	0.7738	±	0.0445
- stem	2	none	acc	↑	0.5714	±	0.0391

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.660	±	0.0300
		strict-match	5	exact_match	↑	0.664	±	0.0299

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.624	±	0.0217
		strict-match	5	exact_match	↑	0.630	±	0.0216

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6717	±	0.0217
- humanities	2	none	acc	↑	0.6703	±	0.0426
- other	2	none	acc	↑	0.6703	±	0.0479
- social sciences	2	none	acc	↑	0.7857	±	0.0418
- stem	2	none	acc	↑	0.6015	±	0.0400

noneUsername
/

Wayfarer-12B-W8A8

Model tree for noneUsername/Wayfarer-12B-W8A8