vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.620 | ± | 0.0308 |
strict-match | 5 | exact_match | ↑ | 0.616 | ± | 0.0308 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.604 | ± | 0.0219 |
strict-match | 5 | exact_match | ↑ | 0.606 | ± | 0.0219 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=1024,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 6.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.6784 | ± | 0.0237 | |
- humanities | 2 | none | acc | ↑ | 0.6923 | ± | 0.0451 | |
- other | 2 | none | acc | ↑ | 0.6923 | ± | 0.0538 | |
- social sciences | 2 | none | acc | ↑ | 0.7917 | ± | 0.0444 | |
- stem | 2 | none | acc | ↑ | 0.5877 | ± | 0.0442 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.660 | ± | 0.0300 |
strict-match | 5 | exact_match | ↑ | 0.652 | ± | 0.0302 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.632 | ± | 0.0216 |
strict-match | 5 | exact_match | ↑ | 0.628 | ± | 0.0216 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.6566 | ± | 0.0220 | |
- humanities | 2 | none | acc | ↑ | 0.6593 | ± | 0.0444 | |
- other | 2 | none | acc | ↑ | 0.6703 | ± | 0.0491 | |
- social sciences | 2 | none | acc | ↑ | 0.7738 | ± | 0.0445 | |
- stem | 2 | none | acc | ↑ | 0.5714 | ± | 0.0391 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.660 | ± | 0.0300 |
strict-match | 5 | exact_match | ↑ | 0.664 | ± | 0.0299 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.624 | ± | 0.0217 |
strict-match | 5 | exact_match | ↑ | 0.630 | ± | 0.0216 |
vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.6717 | ± | 0.0217 | |
- humanities | 2 | none | acc | ↑ | 0.6703 | ± | 0.0426 | |
- other | 2 | none | acc | ↑ | 0.6703 | ± | 0.0479 | |
- social sciences | 2 | none | acc | ↑ | 0.7857 | ± | 0.0418 | |
- stem | 2 | none | acc | ↑ | 0.6015 | ± | 0.0400 |
- Downloads last month
- 7
Model tree for noneUsername/Wayfarer-12B-W8A8
Base model
mistralai/Mistral-Nemo-Base-2407