RedHatAI
/

Mixtral-8x22B-Instruct-v0.1-AutoFP8

Text Generation

text-generation-inference

Model card Files Files and versions Community

Mixtral-8x22B-Instruct-v0.1-AutoFP8 / README.md

abhinavnmagic's picture

Create README.md

19ea320 verified 10 months ago

|

1.56 kB

	---
	tags:
	- fp8
	- vllm
	---

	# Mixtral-8x22B-Instruct-v0.1-FP8

	## Model Overview
	Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	## Usage and Creation
	Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).

	## Evaluation

	### Open LLM Leaderboard evaluation scores
	\| \| Meta-Llama-3-70B-Instruct \| Meta-Llama-3-70B-Instruct-FP8<br>(this model) \|
	\| :------------------: \| :----------------------: \| :------------------------------------------------: \|
	\| arc-c<br>25-shot \| 71.58 \| 72.09 \|
	\| hellaswag<br>10-shot \| 86.94 \| 86.83 \|
	\| mmlu<br>5-shot \| 83.97 \| 84.06 \|
	\| truthfulqa<br>0-shot \| 66.98 \| 66.95 \|
	\| winogrande<br>5-shot \| 82.79 \| 83.18 \|
	\| gsm8k<br>5-shot \| 87.56 \| 88.93 \|
	\| Average<br>Accuracy \| 79.97 \| 80.34 \|
	\| Recovery \| 100% \| 100.46% \|