mwitiderrick commited on
Commit
9d2b501
·
verified ·
1 Parent(s): 13a0491

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -1,3 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```python
2
  from deepsparse import TextGeneration
3
 
@@ -11,4 +33,13 @@ He runs 3*60=<<3*60=180>>180 meters a week
11
  So he runs 180/3=<<180/3=60>>60 times a week
12
  #### 60
13
  """
14
- ```
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
3
+ inference: false
4
+ model_type: llama
5
+ prompt_template: |
6
+ Question
7
+ {prompt}\n
8
+ Answer:
9
+ quantized_by: mwitiderrick
10
+ tags:
11
+ - deepsparse
12
+ ---
13
+ ## TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds
14
+ This repo contains model files for [TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
15
+
16
+ This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
17
+ ## Inference
18
+ Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
19
+ ```bash
20
+ pip install deepsparse-nightly[llm]
21
+ ```
22
+ Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
23
  ```python
24
  from deepsparse import TextGeneration
25
 
 
33
  So he runs 180/3=<<180/3=60>>60 times a week
34
  #### 60
35
  """
36
+ ```
37
+ To obtain the final model the following process was followed:
38
+ - Fine-tune the base model on GSM8K for 2 epochs using HF SFTrainer
39
+ - Sparsify the model to 50% using SparseML
40
+ - Fine-tune the sparse model on the GSM8K dataset
41
+ - Perform one-shot quantization of the resulting model
42
+
43
+
44
+
45
+