sylwia-kuros commited on
Commit
7c534f1
·
verified ·
1 Parent(s): d37eafa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # zephyr-7b-beta-int4-ov
8
+
9
+ * Model creator: [Hugging Face H4](https://huggingface.co/HuggingFaceH4)
10
+ * Original model: [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
11
+
12
+
13
+ ## Description
14
+
15
+ This is [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to int8 by [NNCF](https://github.com/openvinotoolkit/nncf).
16
+
17
+ ## Quantization Parameters
18
+
19
+ Weight compression was performed using `nncf.compress_weights` with the following parameters:
20
+
21
+ * mode: **INT4_SYM**
22
+ * group_size: **128**
23
+ * ratio: **0.8**
24
+ * awq: **True**
25
+ * sensitivity_metric: **max_activation_variance**
26
+
27
+ For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
28
+
29
+ ## Compatibility
30
+
31
+ The provided OpenVINO™ IR model is compatible with:
32
+
33
+ * OpenVINO version 2024.1.0 and higher
34
+ * Optimum Intel 1.16.0 and higher
35
+
36
+ ## Running Model Inference
37
+
38
+ 1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
39
+
40
+ ```
41
+ pip install optimum[openvino]
42
+ ```
43
+
44
+ 2. Run model inference:
45
+
46
+ ```
47
+ from transformers import AutoTokenizer
48
+ from optimum.intel.openvino import OVModelForCausalLM
49
+
50
+ model_id = "OpenVINO/zephyr-7b-beta-int4-ov"
51
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
52
+ model = OVModelForCausalLM.from_pretrained(model_id)
53
+
54
+ inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
55
+
56
+ outputs = model.generate(**inputs, max_length=200)
57
+ text = tokenizer.batch_decode(outputs)[0]
58
+ print(text)
59
+ ```
60
+
61
+ For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
62
+
63
+ ## Limitations
64
+
65
+ Check the original model card for [limitations](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta#intended-uses--limitations).
66
+
67
+ ## Legal information
68
+
69
+ The original model is distributed under mit license. More details can be found in [original model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).