Improved quantization using Openvino 2024.5.0rc1

Files changed (5) hide show

README.md CHANGED Viewed

@@ -7,21 +7,20 @@ tags:
 This is an INT4 quantized version of the `mistralai/Mistral-7B-Instruct-v0.2` model. The Python packages used in creating this model are as follows:
 ```
-openvino==2024.4.0
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
-transformers==4.46.1
 ```
 This quantized model is created using the following command:
 ```
-optimum-cli export openvino -m "mistralai/Mistral-7B-Instruct-v0.2" --task text-generation-with-past --weight-format int4 --group-size 128 --trust-remote-code ./Mistral-7B-Instruct-v0.2-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
 |   Num bits (N) | % all parameters (layers)   | % ratio-defining parameters (layers)   |
 |----------------|-----------------------------|----------------------------------------|
-|              8 | 4% (2 / 226)                | 0% (0 / 224)                           |
-|              4 | 96% (224 / 226)             | 100% (224 / 224)                       |

 This is an INT4 quantized version of the `mistralai/Mistral-7B-Instruct-v0.2` model. The Python packages used in creating this model are as follows:
 ```
+openvino==2024.5.0rc1
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
+transformers==4.46.2
 ```
 This quantized model is created using the following command:
 ```
+optimum-cli export openvino --model "mistralai/Mistral-7B-Instruct-v0.2" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./Mistral-7B-Instruct-v0.2-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
 |   Num bits (N) | % all parameters (layers)   | % ratio-defining parameters (layers)   |
 |----------------|-----------------------------|----------------------------------------|
+|              4 | 100% (226 / 226)            | 100% (226 / 226)                       |

config.json CHANGED Viewed

@@ -22,7 +22,7 @@
   "sliding_window": null,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.46.1",
   "use_cache": true,
   "vocab_size": 32000
 }

   "sliding_window": null,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
+  "transformers_version": "4.46.2",
   "use_cache": true,
   "vocab_size": 32000
 }

generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "_from_model_config": true,
   "bos_token_id": 1,
   "eos_token_id": 2,
-  "transformers_version": "4.46.1"
 }

   "_from_model_config": true,
   "bos_token_id": 1,
   "eos_token_id": 2,
+  "transformers_version": "4.46.2"
 }

openvino_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d736c7cc65ba0058436bee9e93673888a85c6edef7ec04d591887bb88970711f
-size 3889377328

 version https://git-lfs.github.com/spec/v1
+oid sha256:29ee4a89d73614bdcccfeca66322d6bb65bb48e924f73f2100ed7c095b6a9181
+size 3734946352

openvino_model.xml CHANGED Viewed

The diff for this file is too large to render. See raw diff