jojo1899 commited on
Commit
dcf0b7b
·
1 Parent(s): d7dcec5

Improved quantization using Openvino 2024.5.0rc1

Browse files
README.md CHANGED
@@ -7,21 +7,20 @@ tags:
7
 
8
  This is an INT4 quantized version of the `mistralai/Mistral-7B-Instruct-v0.2` model. The Python packages used in creating this model are as follows:
9
  ```
10
- openvino==2024.4.0
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
- transformers==4.46.1
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
- optimum-cli export openvino -m "mistralai/Mistral-7B-Instruct-v0.2" --task text-generation-with-past --weight-format int4 --group-size 128 --trust-remote-code ./Mistral-7B-Instruct-v0.2-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
  | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
  |----------------|-----------------------------|----------------------------------------|
26
- | 8 | 4% (2 / 226) | 0% (0 / 224) |
27
- | 4 | 96% (224 / 226) | 100% (224 / 224) |
 
7
 
8
  This is an INT4 quantized version of the `mistralai/Mistral-7B-Instruct-v0.2` model. The Python packages used in creating this model are as follows:
9
  ```
10
+ openvino==2024.5.0rc1
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
+ transformers==4.46.2
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
+ optimum-cli export openvino --model "mistralai/Mistral-7B-Instruct-v0.2" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./Mistral-7B-Instruct-v0.2-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
  | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
  |----------------|-----------------------------|----------------------------------------|
26
+ | 4 | 100% (226 / 226) | 100% (226 / 226) |
 
config.json CHANGED
@@ -22,7 +22,7 @@
22
  "sliding_window": null,
23
  "tie_word_embeddings": false,
24
  "torch_dtype": "bfloat16",
25
- "transformers_version": "4.46.1",
26
  "use_cache": true,
27
  "vocab_size": 32000
28
  }
 
22
  "sliding_window": null,
23
  "tie_word_embeddings": false,
24
  "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.46.2",
26
  "use_cache": true,
27
  "vocab_size": 32000
28
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.46.1"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.46.2"
6
  }
openvino_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d736c7cc65ba0058436bee9e93673888a85c6edef7ec04d591887bb88970711f
3
- size 3889377328
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29ee4a89d73614bdcccfeca66322d6bb65bb48e924f73f2100ed7c095b6a9181
3
+ size 3734946352
openvino_model.xml CHANGED
The diff for this file is too large to render. See raw diff