openvino-ci commited on
Commit
375b0f2
·
verified ·
1 Parent(s): 7fe5175

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,101 +1,66 @@
1
  ---
2
  license: mit
3
- language:
4
- - en
5
  ---
6
-
7
  # dolly-v2-3b-int4-ov
8
-
9
- * Model creator: [Databricks](https://huggingface.co/databricks)
10
  * Original model: [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
11
 
12
  ## Description
13
-
14
- This is [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to int8 by [NNCF](https://github.com/openvinotoolkit/nncf).
15
 
16
  ## Quantization Parameters
17
 
18
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
19
 
20
- * mode: **INT4_ASYM**
21
- * group_size: **32**
22
- * ratio: **0.5**
23
- * sensitivity_metric: **weight_quantization_error**
24
 
25
  For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
26
 
 
27
  ## Compatibility
28
 
29
  The provided OpenVINO™ IR model is compatible with:
30
 
31
- * OpenVINO version 2024.2.0 and higher
32
- * Optimum Intel 1.17.0 and higher
33
 
34
- ## Running Model Inference with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
35
 
36
  1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
37
 
38
- ```
39
- pip install optimum[openvino]
40
- ```
41
-
42
- 2. Run model inference:
43
-
44
- ```
45
- from transformers import AutoTokenizer
46
- from optimum.intel.openvino import OVModelForCausalLM
47
-
48
- model_id = "OpenVINO/dolly-v2-3b-int4-ov"
49
- tokenizer = AutoTokenizer.from_pretrained(model_id)
50
- model = OVModelForCausalLM.from_pretrained(model_id)
51
-
52
- inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
53
-
54
- outputs = model.generate(**inputs, max_length=200)
55
- text = tokenizer.batch_decode(outputs)[0]
56
- print(text)
57
- ```
58
-
59
- For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
60
-
61
- ## Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)
62
-
63
- 1. Install packages required for using OpenVINO GenAI.
64
- ```
65
- pip install openvino-genai huggingface_hub
66
  ```
67
-
68
- 2. Download model from HuggingFace Hub
69
-
70
  ```
71
- import huggingface_hub as hf_hub
72
-
73
- model_id = "OpenVINO/dolly-v2-3b-int4-ov"
74
- model_path = "dolly-v2-3b-int4-ov"
75
 
76
- hf_hub.snapshot_download(model_id, local_dir=model_path)
77
 
78
  ```
 
 
79
 
80
- 3. Run model inference:
 
 
81
 
82
- ```
83
- import openvino_genai as ov_genai
84
 
85
- device = "CPU"
86
- pipe = ov_genai.LLMPipeline(model_path, device)
87
- print(pipe.generate("What is OpenVINO?", max_length=200))
88
  ```
89
 
90
- More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)
91
 
92
  ## Limitations
93
 
94
- Check the original model card for [limitations](https://huggingface.co/databricks/dolly-v2-3b#known-limitations).
95
 
96
  ## Legal information
97
 
98
- The original model is distributed under [MIT](https://choosealicense.com/licenses/mit/) license. More details can be found in [original model card](https://huggingface.co/databricks/dolly-v2-3b).
99
 
100
  ## Disclaimer
101
 
 
1
  ---
2
  license: mit
3
+ license_link: https://choosealicense.com/licenses/mit/
 
4
  ---
 
5
  # dolly-v2-3b-int4-ov
6
+ * Model creator: [Databricks](https://huggingface.co/databricks)
 
7
  * Original model: [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
8
 
9
  ## Description
10
+ This is [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 by [NNCF](https://github.com/openvinotoolkit/nncf).
 
11
 
12
  ## Quantization Parameters
13
 
14
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
15
 
16
+ * mode: **int4_asym**
17
+ * ratio: **1**
18
+ * group_size: **128**
 
19
 
20
  For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
21
 
22
+
23
  ## Compatibility
24
 
25
  The provided OpenVINO™ IR model is compatible with:
26
 
27
+ * OpenVINO version 2024.4.0 and higher
28
+ * Optimum Intel 1.20.0 and higher
29
 
30
+ ## Running Model Inference
31
 
32
  1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ```
35
+ pip install optimum[openvino]
 
 
36
  ```
 
 
 
 
37
 
38
+ 2. Run model inference:
39
 
40
  ```
41
+ from transformers import AutoTokenizer
42
+ from optimum.intel.openvino import OVModelForCausalLM
43
 
44
+ model_id = "OpenVINO/dolly-v2-3b-int4-ov"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
46
+ model = OVModelForCausalLM.from_pretrained(model_id)
47
 
48
+ inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
 
49
 
50
+ outputs = model.generate(**inputs, max_length=200)
51
+ text = tokenizer.batch_decode(outputs)[0]
52
+ print(text)
53
  ```
54
 
55
+ For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
56
 
57
  ## Limitations
58
 
59
+ Check the original model card for [original model card](https://huggingface.co/databricks/dolly-v2-3b) for limitations.
60
 
61
  ## Legal information
62
 
63
+ The original model is distributed under [mit](https://choosealicense.com/licenses/mit/) license. More details can be found in [original model card](https://huggingface.co/databricks/dolly-v2-3b).
64
 
65
  ## Disclaimer
66
 
config.json CHANGED
@@ -25,13 +25,15 @@
25
  "model_type": "gpt_neox",
26
  "num_attention_heads": 32,
27
  "num_hidden_layers": 32,
 
28
  "rope_scaling": null,
 
29
  "rotary_emb_base": 10000,
30
  "rotary_pct": 0.25,
31
  "tie_word_embeddings": false,
32
- "torch_dtype": "float32",
33
- "transformers_version": "4.40.1",
34
  "use_cache": true,
35
  "use_parallel_residual": true,
36
  "vocab_size": 50280
37
- }
 
25
  "model_type": "gpt_neox",
26
  "num_attention_heads": 32,
27
  "num_hidden_layers": 32,
28
+ "partial_rotary_factor": 0.25,
29
  "rope_scaling": null,
30
+ "rope_theta": 10000,
31
  "rotary_emb_base": 10000,
32
  "rotary_pct": 0.25,
33
  "tie_word_embeddings": false,
34
+ "torch_dtype": "bfloat16",
35
+ "transformers_version": "4.45.2",
36
  "use_cache": true,
37
  "use_parallel_residual": true,
38
  "vocab_size": 50280
39
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.45.2"
6
+ }
openvino_config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "compression": null,
3
+ "dtype": "int4",
4
+ "input_info": null,
5
+ "optimum_version": "1.23.1",
6
+ "quantization_config": {
7
+ "all_layers": null,
8
+ "bits": 4,
9
+ "dataset": "wikitext2",
10
+ "gptq": null,
11
+ "group_size": 128,
12
+ "ignored_scope": null,
13
+ "num_samples": null,
14
+ "quant_method": "default",
15
+ "ratio": 1.0,
16
+ "scale_estimation": true,
17
+ "sensitivity_metric": null,
18
+ "sym": false,
19
+ "tokenizer": null,
20
+ "trust_remote_code": true,
21
+ "weight_format": "int4"
22
+ },
23
+ "save_onnx_model": false,
24
+ "transformers_version": "4.45.2"
25
+ }
openvino_detokenizer.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3d0218341805b3876fc9c8e95c98d75cc9ddd0fa34fac6df212e790b6f91a08
3
- size 558494
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e43770f23d5b9dbfc8bf99bbea4fe501870adf36235dff20156f6c0a129a47
3
+ size 514078
openvino_detokenizer.xml CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a7675447cb4bf5c86e1f08f505dd3df3616bc5e96c5e67569836fed97d6cac47
3
- size 5981
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c187930ab4452cae4de5c761fe620de06307bdfb6e865b03c85e485215c55dd8
3
+ size 4507
openvino_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2f679693d89d3e2ba5b804708ca101f35cfa2f3d1f4afdc2e1642f50e5235eb4
3
- size 2256897098
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0a424a5a7dcd62537a2011fb4ce3b9af19baa380db733022e1a922fe3a46143
3
+ size 1568639944
openvino_model.xml CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:871fc829ec134e248bb061c549ae5c021af9087857331ade218a2206169a6cea
3
- size 3995613
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1804685581226bf122af04f6c89c093ea0b3444853799f84c2f973d99e20d28
3
+ size 2551632
openvino_tokenizer.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bf4bb4307e428d32680d2973f34247c51210fad1f4c6408b2087dfdfb053e210
3
- size 1166376
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c378d88077ae7c7e13ef61745e1ceef76412338e9a7398445c09632413e52abe
3
+ size 1227935
openvino_tokenizer.xml CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c7cdfc7122104652c01e43777808b3c7c4670445cceaf2976554adc694d9064
3
- size 27473
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8aaa5bd8191657b5168bdd1e342c2edc60827bd718fa5f11388c0dc7ef7b6d9
3
+ size 22339
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -234,7 +234,7 @@
234
  "### Response:"
235
  ],
236
  "bos_token": "<|endoftext|>",
237
- "clean_up_tokenization_spaces": true,
238
  "eos_token": "<|endoftext|>",
239
  "model_max_length": 1000000000000000019884624838656,
240
  "pad_token": "<|endoftext|>",
 
234
  "### Response:"
235
  ],
236
  "bos_token": "<|endoftext|>",
237
+ "clean_up_tokenization_spaces": false,
238
  "eos_token": "<|endoftext|>",
239
  "model_max_length": 1000000000000000019884624838656,
240
  "pad_token": "<|endoftext|>",