Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +24 -59
config.json +5 -3
generation_config.json +6 -0
openvino_config.json +25 -0
openvino_detokenizer.bin +2 -2
openvino_detokenizer.xml +2 -2
openvino_model.bin +2 -2
openvino_model.xml +2 -2
openvino_tokenizer.bin +2 -2
openvino_tokenizer.xml +2 -2
tokenizer.json +0 -0
tokenizer_config.json +1 -1

README.md CHANGED Viewed

@@ -1,101 +1,66 @@
 ---
 license: mit
-language:
-- en
 ---
 # dolly-v2-3b-int4-ov
- * Model creator: [Databricks](https://huggingface.co/databricks)
  * Original model: [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
 ## Description
-This is [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to int8 by [NNCF](https://github.com/openvinotoolkit/nncf).
 ## Quantization Parameters
 Weight compression was performed using `nncf.compress_weights` with the following parameters:
-* mode: **INT4_ASYM**
-* group_size: **32**
-* ratio: **0.5**
-* sensitivity_metric: **weight_quantization_error**
 For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
 ## Compatibility
 The provided OpenVINO™ IR model is compatible with:
-* OpenVINO version 2024.2.0 and higher
-* Optimum Intel 1.17.0 and higher
-## Running Model Inference with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
 1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
-    ```
-    pip install optimum[openvino]
-    ```
-2. Run model inference:
-    ```
-    from transformers import AutoTokenizer
-    from optimum.intel.openvino import OVModelForCausalLM
-    model_id = "OpenVINO/dolly-v2-3b-int4-ov"
-    tokenizer = AutoTokenizer.from_pretrained(model_id)
-    model = OVModelForCausalLM.from_pretrained(model_id)
-    inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
-    outputs = model.generate(**inputs, max_length=200)
-    text = tokenizer.batch_decode(outputs)[0]
-    print(text)
-    ```
-For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
-## Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)
-1. Install packages required for using OpenVINO GenAI.
-```
-pip install openvino-genai huggingface_hub
 ```
-2. Download model from HuggingFace Hub
 ```
-import huggingface_hub as hf_hub
-model_id = "OpenVINO/dolly-v2-3b-int4-ov"
-model_path = "dolly-v2-3b-int4-ov"
-hf_hub.snapshot_download(model_id, local_dir=model_path)
 ```
-3. Run model inference:
-```
-import openvino_genai as ov_genai
-device = "CPU"
-pipe = ov_genai.LLMPipeline(model_path, device)
-print(pipe.generate("What is OpenVINO?", max_length=200))
 ```
-More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)
 ## Limitations
-Check the original model card for [limitations](https://huggingface.co/databricks/dolly-v2-3b#known-limitations).
 ## Legal information
-The original model is distributed under [MIT](https://choosealicense.com/licenses/mit/) license. More details can be found in [original model card](https://huggingface.co/databricks/dolly-v2-3b).
 ## Disclaimer

 ---
 license: mit
+license_link: https://choosealicense.com/licenses/mit/
 ---
 # dolly-v2-3b-int4-ov
+* Model creator: [Databricks](https://huggingface.co/databricks)
  * Original model: [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
 ## Description
+This is [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 by [NNCF](https://github.com/openvinotoolkit/nncf).
 ## Quantization Parameters
 Weight compression was performed using `nncf.compress_weights` with the following parameters:
+* mode: **int4_asym**
+* ratio: **1**
+* group_size: **128**
 For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
 ## Compatibility
 The provided OpenVINO™ IR model is compatible with:
+* OpenVINO version 2024.4.0 and higher
+* Optimum Intel 1.20.0 and higher
+## Running Model Inference
 1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
 ```
+pip install optimum[openvino]
 ```
+2. Run model inference:
 ```
+from transformers import AutoTokenizer
+from optimum.intel.openvino import OVModelForCausalLM
+model_id = "OpenVINO/dolly-v2-3b-int4-ov"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = OVModelForCausalLM.from_pretrained(model_id)
+inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
+outputs = model.generate(**inputs, max_length=200)
+text = tokenizer.batch_decode(outputs)[0]
+print(text)
 ```
+For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
 ## Limitations
+Check the original model card for [original model card](https://huggingface.co/databricks/dolly-v2-3b) for limitations.
 ## Legal information
+The original model is distributed under [mit](https://choosealicense.com/licenses/mit/) license. More details can be found in [original model card](https://huggingface.co/databricks/dolly-v2-3b).
 ## Disclaimer

config.json CHANGED Viewed

@@ -25,13 +25,15 @@
   "model_type": "gpt_neox",
   "num_attention_heads": 32,
   "num_hidden_layers": 32,
   "rope_scaling": null,
   "rotary_emb_base": 10000,
   "rotary_pct": 0.25,
   "tie_word_embeddings": false,
-  "torch_dtype": "float32",
-  "transformers_version": "4.40.1",
   "use_cache": true,
   "use_parallel_residual": true,
   "vocab_size": 50280
-}

   "model_type": "gpt_neox",
   "num_attention_heads": 32,
   "num_hidden_layers": 32,
+  "partial_rotary_factor": 0.25,
   "rope_scaling": null,
+  "rope_theta": 10000,
   "rotary_emb_base": 10000,
   "rotary_pct": 0.25,
   "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.45.2",
   "use_cache": true,
   "use_parallel_residual": true,
   "vocab_size": 50280
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.45.2"
+}

openvino_config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "compression": null,
+  "dtype": "int4",
+  "input_info": null,
+  "optimum_version": "1.23.1",
+  "quantization_config": {
+    "all_layers": null,
+    "bits": 4,
+    "dataset": "wikitext2",
+    "gptq": null,
+    "group_size": 128,
+    "ignored_scope": null,
+    "num_samples": null,
+    "quant_method": "default",
+    "ratio": 1.0,
+    "scale_estimation": true,
+    "sensitivity_metric": null,
+    "sym": false,
+    "tokenizer": null,
+    "trust_remote_code": true,
+    "weight_format": "int4"
+  },
+  "save_onnx_model": false,
+  "transformers_version": "4.45.2"
+}

openvino_detokenizer.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3d0218341805b3876fc9c8e95c98d75cc9ddd0fa34fac6df212e790b6f91a08
-size 558494

 version https://git-lfs.github.com/spec/v1
+oid sha256:f1e43770f23d5b9dbfc8bf99bbea4fe501870adf36235dff20156f6c0a129a47
+size 514078

openvino_detokenizer.xml CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a7675447cb4bf5c86e1f08f505dd3df3616bc5e96c5e67569836fed97d6cac47
-size 5981

 version https://git-lfs.github.com/spec/v1
+oid sha256:c187930ab4452cae4de5c761fe620de06307bdfb6e865b03c85e485215c55dd8
+size 4507

openvino_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2f679693d89d3e2ba5b804708ca101f35cfa2f3d1f4afdc2e1642f50e5235eb4
-size 2256897098

 version https://git-lfs.github.com/spec/v1
+oid sha256:c0a424a5a7dcd62537a2011fb4ce3b9af19baa380db733022e1a922fe3a46143
+size 1568639944

openvino_model.xml CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:871fc829ec134e248bb061c549ae5c021af9087857331ade218a2206169a6cea
-size 3995613

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1804685581226bf122af04f6c89c093ea0b3444853799f84c2f973d99e20d28
+size 2551632

openvino_tokenizer.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bf4bb4307e428d32680d2973f34247c51210fad1f4c6408b2087dfdfb053e210
-size 1166376

 version https://git-lfs.github.com/spec/v1
+oid sha256:c378d88077ae7c7e13ef61745e1ceef76412338e9a7398445c09632413e52abe
+size 1227935

openvino_tokenizer.xml CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c7cdfc7122104652c01e43777808b3c7c4670445cceaf2976554adc694d9064
-size 27473

 version https://git-lfs.github.com/spec/v1
+oid sha256:e8aaa5bd8191657b5168bdd1e342c2edc60827bd718fa5f11388c0dc7ef7b6d9
+size 22339

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -234,7 +234,7 @@
     "### Response:"
   ],
   "bos_token": "<|endoftext|>",
-  "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "<|endoftext|>",

     "### Response:"
   ],
   "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "<|endoftext|>",