abhinavnmagic's picture
Create README.md
19ea320 verified
|
raw
history blame
1.56 kB
metadata
tags:
  - fp8
  - vllm

Mixtral-8x22B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat.

Evaluation

Open LLM Leaderboard evaluation scores

Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-FP8
(this model)
arc-c
25-shot
71.58 72.09
hellaswag
10-shot
86.94 86.83
mmlu
5-shot
83.97 84.06
truthfulqa
0-shot
66.98 66.95
winogrande
5-shot
82.79 83.18
gsm8k
5-shot
87.56 88.93
Average
Accuracy
79.97 80.34
Recovery 100% 100.46%