--- base_model: - LatitudeGames/Wayfarer-12B --- vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.620|± |0.0308| | | |strict-match | 5|exact_match|↑ |0.616|± |0.0308| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.604|± |0.0219| | | |strict-match | 5|exact_match|↑ |0.606|± |0.0219| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B,add_bos_token=true,max_model_len=1024,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 6.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.6784|± |0.0237| | - humanities | 2|none | |acc |↑ |0.6923|± |0.0451| | - other | 2|none | |acc |↑ |0.6923|± |0.0538| | - social sciences| 2|none | |acc |↑ |0.7917|± |0.0444| | - stem | 2|none | |acc |↑ |0.5877|± |0.0442| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.660|± |0.0300| | | |strict-match | 5|exact_match|↑ |0.652|± |0.0302| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.632|± |0.0216| | | |strict-match | 5|exact_match|↑ |0.628|± |0.0216| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-86,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.6566|± |0.0220| | - humanities | 2|none | |acc |↑ |0.6593|± |0.0444| | - other | 2|none | |acc |↑ |0.6703|± |0.0491| | - social sciences| 2|none | |acc |↑ |0.7738|± |0.0445| | - stem | 2|none | |acc |↑ |0.5714|± |0.0391| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.660|± |0.0300| | | |strict-match | 5|exact_match|↑ |0.664|± |0.0299| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.624|± |0.0217| | | |strict-match | 5|exact_match|↑ |0.630|± |0.0216| vllm (pretrained=/root/autodl-tmp/Wayfarer-12B-87,add_bos_token=true,max_model_len=800,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.6717|± |0.0217| | - humanities | 2|none | |acc |↑ |0.6703|± |0.0426| | - other | 2|none | |acc |↑ |0.6703|± |0.0479| | - social sciences| 2|none | |acc |↑ |0.7857|± |0.0418| | - stem | 2|none | |acc |↑ |0.6015|± |0.0400|