It is worth noting that compared with the prince-canuma version, this version is smaller in size after quantization and its accuracy is also improved by one percentage point.

In my ERP testing, this model did perform better.

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.820 ± 0.0243
strict-match 5 exact_match ↑ 0.816 ± 0.0246

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.804 ± 0.0252
strict-match 5 exact_match ↑ 0.804 ± 0.0252

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float32), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.820 ± 0.0243
strict-match 5 exact_match ↑ 0.816 ± 0.0246

vllm (pretrained=/root/autodl-tmp/output,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.816 ± 0.0246
strict-match 5 exact_match ↑ 0.812 ± 0.0248

vllm (pretrained=/root/autodl-tmp/output,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.796 ± 0.0255
strict-match 5 exact_match ↑ 0.792 ± 0.0257
Downloads last month
4
Safetensors
Model size
8.02B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for noneUsername/TouchNight-Ministral-8B-Instruct-2410-HF-W8A8-Dynamic-Per-Token

Quantized
(6)
this model