File size: 4,815 Bytes
917721b
 
 
 
 
 
 
 
 
 
b6f9c09
917721b
fabd47e
917721b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
tags:
- deepsparse
---
## Usage

```python
from deepsparse import TextGeneration

prompt = "How to get in a good university?"
formatted_prompt = f"<s>[INST]{prompt}[/INST]"

model = TextGeneration(model_path="hf:neuralmagic/Yi-6B-Llama-50-quant")
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)

"""
Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university:

1. Academic performance:

- Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects.
- Check if the university offers a clear curriculum that includes a clear sequence of courses.
- Check if the university offers a clear pathway to graduation, including clear dates and deadlines.

2. Financial aid:

- Look for a university that offers financial aid, such as scholarships, grants, or loans.
- Check if the university offers financial aid that fits your budget.
- Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses.
"""
```

## One-shot and Export

```
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]" "torch<2"
python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py chargoddard/Yi-6B-Llama open_platypus --recipe sparseml/src/sparseml/transformers/sparsification/obcq/example_llama.yaml --precision float16  --save True --device cuda
cd sparseml 
git checkout update/onnx_export/duplicate
python src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path /root/obcq_deployment
cp deployment/model.onnx deployment/model-orig.onnx
python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
```

`recipe.yaml`
```
test_stage:
  obcq_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.5
      mappings: [
        [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
        [["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"]
      ]
    QuantizationModifier:
      ignore:
        - LlamaRotaryEmbedding
        - LlamaRMSNorm
        - SiLUActivation
        - model.layers.0.mlp.down_proj
        - model.layers.1.mlp.down_proj
        - model.layers.2.mlp.down_proj
        - model.layers.3.mlp.down_proj
        - model.layers.4.mlp.down_proj
        - model.layers.5.mlp.down_proj
        - model.layers.6.mlp.down_proj
        - model.layers.7.mlp.down_proj
        - model.layers.8.mlp.down_proj
        - model.layers.9.mlp.down_proj
        - model.layers.10.mlp.down_proj
        - model.layers.11.mlp.down_proj
        - model.layers.12.mlp.down_proj
        - model.layers.13.mlp.down_proj
        - model.layers.14.mlp.down_proj
        - model.layers.15.mlp.down_proj
        - model.layers.16.mlp.down_proj
        - model.layers.17.mlp.down_proj
        - model.layers.18.mlp.down_proj
        - model.layers.19.mlp.down_proj
        - model.layers.20.mlp.down_proj
        - model.layers.21.mlp.down_proj
        - model.layers.22.mlp.down_proj
        - model.layers.23.mlp.down_proj
        - model.layers.24.mlp.down_proj
        - model.layers.25.mlp.down_proj
        - model.layers.26.mlp.down_proj
        - model.layers.27.mlp.down_proj
        - model.layers.28.mlp.down_proj
        - model.layers.29.mlp.down_proj
        - model.layers.30.mlp.down_proj
        - model.layers.31.mlp.down_proj
      post_oneshot_calibration: True
      scheme_overrides:
        Embedding:
          input_activations: null
          weights:
            num_bits: 8
            symmetric: False
    SparseGPTModifier:
      sparsity: 0.5
      block_size: 128
      sequential_update: False
      quantize: True
      percdamp: 0.01
      mask_structure: "0:0"
      targets: [
        "model.layers.0",
        "model.layers.1",
        "model.layers.2",
        "model.layers.3",
        "model.layers.4",
        "model.layers.5",
        "model.layers.6",
        "model.layers.7",
        "model.layers.8",
        "model.layers.9",
        "model.layers.10",
        "model.layers.11",
        "model.layers.12",
        "model.layers.13",
        "model.layers.14",
        "model.layers.15",
        "model.layers.16",
        "model.layers.17",
        "model.layers.18",
        "model.layers.19",
        "model.layers.20",
        "model.layers.21",
        "model.layers.22",
        "model.layers.23",
        "model.layers.24",
        "model.layers.25",
        "model.layers.26",
        "model.layers.27",
        "model.layers.28",
        "model.layers.29",
        "model.layers.30",
        "model.layers.31",
      ]
```