metadata

tags:
  - fp8
  - vllm

Mixtral-8x22B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	Meta-Llama-3-70B-Instruct	Meta-Llama-3-70B-Instruct-FP8 (this model)
arc-c 25-shot	71.58	72.09
hellaswag 10-shot	86.94	86.83
mmlu 5-shot	83.97	84.06
truthfulqa 0-shot	66.98	66.95
winogrande 5-shot	82.79	83.18
gsm8k 5-shot	87.56	88.93
Average Accuracy	79.97	80.34
Recovery	100%	100.46%