--- base_model: - huihui-ai/Qwen2.5-14B-Instruct-1M-abliterated --- vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.868|± |0.0215| | | |strict-match | 5|exact_match|↑ |0.872|± |0.0212| vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.872|± |0.0150| | | |strict-match | 5|exact_match|↑ |0.870|± |0.0151| vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated,add_bos_token=true,max_model_len=700,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7769|± |0.0202| | - humanities | 2|none | |acc |↑ |0.7692|± |0.0440| | - other | 2|none | |acc |↑ |0.7582|± |0.0406| | - social sciences| 2|none | |acc |↑ |0.8452|± |0.0376| | - stem | 2|none | |acc |↑ |0.7519|± |0.0376| vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.864|± |0.0217| | | |strict-match | 5|exact_match|↑ |0.864|± |0.0217| vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated-87,add_bos_token=true,max_model_len=2048,tensor_parallel_size=2,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.882|± |0.0144| | | |strict-match | 5|exact_match|↑ |0.874|± |0.0149| vllm (pretrained=/root/autodl-tmp/Qwen2.5-14B-Instruct-1M-abliterated-87,add_bos_token=true,max_model_len=700,tensor_parallel_size=2,dtype=bfloat16,enforce_eager=True), gen_kwargs: (None), limit: 7.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7769|± |0.0201| | - humanities | 2|none | |acc |↑ |0.7692|± |0.0440| | - other | 2|none | |acc |↑ |0.7692|± |0.0391| | - social sciences| 2|none | |acc |↑ |0.8333|± |0.0370| | - stem | 2|none | |acc |↑ |0.7519|± |0.0381|