nm-testing
/

TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

Text Generation

Model card Files Files and versions Community

mwitiderrick commited on Jan 26, 2024

Commit

9d2b501

·

verified ·

1 Parent(s): 13a0491

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
 ```python
 from deepsparse import TextGeneration
@@ -11,4 +33,13 @@ He runs 3*60=<<3*60=180>>180 meters a week
 So he runs 180/3=<<180/3=60>>60 times a week
 #### 60
 """
-```

+---
+base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+inference: false
+model_type: llama
+prompt_template: |
+  Question
+  {prompt}\n
+  Answer:
+quantized_by: mwitiderrick
+tags:
+- deepsparse
+---
+## TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds
+This repo contains model files for [TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
+This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
+## Inference
+Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
+```bash
+pip install deepsparse-nightly[llm]
+```
+Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
 ```python
 from deepsparse import TextGeneration
 So he runs 180/3=<<180/3=60>>60 times a week
 #### 60
 """
+```
+To obtain the final model the following process was followed:
+- Fine-tune the base model on GSM8K for 2 epochs using HF SFTrainer
+- Sparsify the model to 50% using SparseML
+- Fine-tune the sparse model on the GSM8K dataset
+- Perform one-shot quantization of the resulting model