nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

This repo contains model files for TinyLlama-1.1B-intermediate-step-1431k-3T optimized for DeepSparse, a CPU inference runtime for sparse models.

This model was quantized and pruned with SparseGPT, using SparseML.

Inference

Install DeepSparse LLM for fast inference on CPUs:

pip install deepsparse-nightly[llm]

Run in a Python pipeline:

from deepsparse import TextGeneration

prompt = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week?"
formatted_prompt =  f"Question:{prompt}\nAnswer:"

model = TextGeneration(model_path="hf:nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds")
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
"""
He runs 3*60=<<3*60=180>>180 meters a week
So he runs 180/3=<<180/3=60>>60 times a week
#### 60
"""

To obtain the final model the following process was followed:

Fine-tune the base model on GSM8K for 2 epochs using HF SFTrainer
Sparsify the model to 50% using SparseML
Fine-tune the sparse model on the GSM8K dataset
Perform one-shot quantization of the resulting model

nm-testing
/

TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

Inference

Model tree for nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds