Update README.md

fabd47e about 1 year ago

4.82 kB

	---
	tags:
	- deepsparse
	---
	## Usage

	```python
	from deepsparse import TextGeneration

	prompt = "How to get in a good university?"
	formatted_prompt = f"<s>[INST]{prompt}[/INST]"

	model = TextGeneration(model_path="hf:neuralmagic/Yi-6B-Llama-50-quant")
	print(model(formatted_prompt, max_new_tokens=200).generations[0].text)

	"""
	Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university:

	1. Academic performance:

	- Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects.
	- Check if the university offers a clear curriculum that includes a clear sequence of courses.
	- Check if the university offers a clear pathway to graduation, including clear dates and deadlines.

	2. Financial aid:

	- Look for a university that offers financial aid, such as scholarships, grants, or loans.
	- Check if the university offers financial aid that fits your budget.
	- Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses.
	"""
	```

	## One-shot and Export

	```
	git clone https://github.com/neuralmagic/sparseml
	pip install -e "sparseml[transformers]" "torch<2"
	python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py chargoddard/Yi-6B-Llama open_platypus --recipe sparseml/src/sparseml/transformers/sparsification/obcq/example_llama.yaml --precision float16 --save True --device cuda
	cd sparseml
	git checkout update/onnx_export/duplicate
	python src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path /root/obcq_deployment
	cp deployment/model.onnx deployment/model-orig.onnx
	python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
	```

	`recipe.yaml`
	```
	test_stage:
	obcq_modifiers:
	SmoothQuantModifier:
	smoothing_strength: 0.5
	mappings: [
	[["re:.q_proj", "re:.k_proj", "re:.v_proj"], "re:.input_layernorm"],
	[["re:.gate_proj", "re:.up_proj"], "re:.*post_attention_layernorm"]
	]
	QuantizationModifier:
	ignore:
	- LlamaRotaryEmbedding
	- LlamaRMSNorm
	- SiLUActivation
	- model.layers.0.mlp.down_proj
	- model.layers.1.mlp.down_proj
	- model.layers.2.mlp.down_proj
	- model.layers.3.mlp.down_proj
	- model.layers.4.mlp.down_proj
	- model.layers.5.mlp.down_proj
	- model.layers.6.mlp.down_proj
	- model.layers.7.mlp.down_proj
	- model.layers.8.mlp.down_proj
	- model.layers.9.mlp.down_proj
	- model.layers.10.mlp.down_proj
	- model.layers.11.mlp.down_proj
	- model.layers.12.mlp.down_proj
	- model.layers.13.mlp.down_proj
	- model.layers.14.mlp.down_proj
	- model.layers.15.mlp.down_proj
	- model.layers.16.mlp.down_proj
	- model.layers.17.mlp.down_proj
	- model.layers.18.mlp.down_proj
	- model.layers.19.mlp.down_proj
	- model.layers.20.mlp.down_proj
	- model.layers.21.mlp.down_proj
	- model.layers.22.mlp.down_proj
	- model.layers.23.mlp.down_proj
	- model.layers.24.mlp.down_proj
	- model.layers.25.mlp.down_proj
	- model.layers.26.mlp.down_proj
	- model.layers.27.mlp.down_proj
	- model.layers.28.mlp.down_proj
	- model.layers.29.mlp.down_proj
	- model.layers.30.mlp.down_proj
	- model.layers.31.mlp.down_proj
	post_oneshot_calibration: True
	scheme_overrides:
	Embedding:
	input_activations: null
	weights:
	num_bits: 8
	symmetric: False
	SparseGPTModifier:
	sparsity: 0.5
	block_size: 128
	sequential_update: False
	quantize: True
	percdamp: 0.01
	mask_structure: "0:0"
	targets: [
	"model.layers.0",
	"model.layers.1",
	"model.layers.2",
	"model.layers.3",
	"model.layers.4",
	"model.layers.5",
	"model.layers.6",
	"model.layers.7",
	"model.layers.8",
	"model.layers.9",
	"model.layers.10",
	"model.layers.11",
	"model.layers.12",
	"model.layers.13",
	"model.layers.14",
	"model.layers.15",
	"model.layers.16",
	"model.layers.17",
	"model.layers.18",
	"model.layers.19",
	"model.layers.20",
	"model.layers.21",
	"model.layers.22",
	"model.layers.23",
	"model.layers.24",
	"model.layers.25",
	"model.layers.26",
	"model.layers.27",
	"model.layers.28",
	"model.layers.29",
	"model.layers.30",
	"model.layers.31",
	]
	```