--- tags: - vllm - sparsity pipeline_tag: text-generation license: llama3.1 base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4 datasets: - openai/gsm8k language: - en metrics: - accuracy --- # Sparse-Llama-3.1-8B-gsm8k-2of4 ## Model Overview - **Model Architecture:** Llama-3.1-8B - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Sparsity:** 2:4 - **Release Date:** 11/21/2024 - **Version:** 1.0 - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE) - **Model Developers:** Neural Magic This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [GSM8k](https://huggingface.co/datasets/openai/gsm8k) dataset. It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model [Llama-3.1-8B-gsm8k](https://huggingface.co/neuralmagic/Llama-3.1-8B-gsm8k) — demonstrating over **100% accuracy recovery**. In constrast, the pretrained [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) achieves 50.7% 5-shot accuracy and the sparse foundational [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) model achieves 56.3% 5-shot accuracy. ### Model Optimizations This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4). Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned. ## Deployment with vLLM This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details. ## Evaluation This model was evaluated on the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). ### Accuracy #### GSM8k Benchmark
Metric Llama-3.1-8B
(5-shot)
Sparse-Llama-3.1-8B-2of4
(5-shot)
Llama-3.1-8B-gsm8k
(0-shot)
Sparse-Llama-3.1-8B-gsm8k-2of4
(0-shot)
Accuracy 50.7% 56.3% 66.3% 66.9%