vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.620 ± 0.0308
strict-match 5 exact_match ↑ 0.616 ± 0.0308

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.604 ± 0.0219
strict-match 5 exact_match ↑ 0.606 ± 0.0219

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=1024,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 6.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6784 ± 0.0237
- humanities 2 none acc ↑ 0.6923 ± 0.0451
- other 2 none acc ↑ 0.6923 ± 0.0538
- social sciences 2 none acc ↑ 0.7917 ± 0.0444
- stem 2 none acc ↑ 0.5877 ± 0.0442

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.660 ± 0.0300
strict-match 5 exact_match ↑ 0.652 ± 0.0302

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.632 ± 0.0216
strict-match 5 exact_match ↑ 0.628 ± 0.0216

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6566 ± 0.0220
- humanities 2 none acc ↑ 0.6593 ± 0.0444
- other 2 none acc ↑ 0.6703 ± 0.0491
- social sciences 2 none acc ↑ 0.7738 ± 0.0445
- stem 2 none acc ↑ 0.5714 ± 0.0391

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.660 ± 0.0300
strict-match 5 exact_match ↑ 0.664 ± 0.0299

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.624 ± 0.0217
strict-match 5 exact_match ↑ 0.630 ± 0.0216

vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6717 ± 0.0217
- humanities 2 none acc ↑ 0.6703 ± 0.0426
- other 2 none acc ↑ 0.6703 ± 0.0479
- social sciences 2 none acc ↑ 0.7857 ± 0.0418
- stem 2 none acc ↑ 0.6015 ± 0.0400
Downloads last month
7
Safetensors
Model size
12.2B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for noneUsername/Wayfarer-12B-W8A8

Quantized
(18)
this model