noneUsername/Orca-2-13b-W8A8-Dynamic-Per-Token

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: 1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.520	±	0.0317
		strict-match	5	exact_match	↑	0.504	±	0.0317

The above are the results of the original model. The results of this model have a deviation of only 0.01.

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,max_model_len=2048,dtype=float32), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.496	±	0.0317
		strict-match	5	exact_match	↑	0.488	±	0.0317

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,max_model_len=2048,dtype=float16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.508	±	0.0317
		strict-match	5	exact_match	↑	0.496	±	0.0317

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.496	±	0.0317
		strict-match	5	exact_match	↑	0.488	±	0.0317

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.504	±	0.0224
		strict-match	5	exact_match	↑	0.480	±	0.0224

vllm (pretrained=/root/autodl-tmp/Orca-2-13b,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.5766	±	0.0158
- humanities	2	none	acc	↑	0.6359	±	0.0311
- other	2	none	acc	↑	0.6513	±	0.0344
- social sciences	2	none	acc	↑	0.6167	±	0.0350
- stem	2	none	acc	↑	0.4596	±	0.0274

vllm (pretrained=/root/autodl-tmp/Orca-2-13b-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.508	±	0.0317
		strict-match	5	exact_match	↑	0.496	±	0.0317

vllm (pretrained=/root/autodl-tmp/Orca-2-13b-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.500	±	0.0224
		strict-match	5	exact_match	↑	0.486	±	0.0224

vllm (pretrained=/root/autodl-tmp/Orca-2-13b-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.5708	±	0.0160
- humanities	2	none	acc	↑	0.6256	±	0.0319
- other	2	none	acc	↑	0.6513	±	0.0346
- social sciences	2	none	acc	↑	0.6222	±	0.0353
- stem	2	none	acc	↑	0.4456	±	0.0277

noneUsername
/

Orca-2-13b-W8A8-Dynamic-Per-Token

Model tree for noneUsername/Orca-2-13b-W8A8-Dynamic-Per-Token