File size: 2,267 Bytes
de2d940
377cd9d
 
de2d940
377cd9d
c9fae9f
 
aed2b7c
282f146
2c51166
377cd9d
67fb153
377cd9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de2d940
377cd9d
 
e78fff1
 
 
377cd9d
 
 
 
 
 
 
 
 
e78fff1
377cd9d
 
 
 
 
 
e78fff1
377cd9d
 
e620322
bc618dd
 
e620322
377cd9d
e78fff1
 
 
 
 
 
 
19ad58c
 
e78fff1
 
 
 
19ad58c
 
e78fff1
 
 
 
 
 
 
e620322
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
language:
- en
license: apache-2.0
tags:
- text-classfication
- int8
- Intel® Neural Compressor
- neural-compressor
- PostTrainingStatic
datasets:
- nyu-mll/glue
metrics:
- accuracy
model_index:
- name: sst2
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: GLUE SST2
      type: glue
      args: sst2
    metric:
      name: Accuracy
      type: accuracy
      value: 0.9254587155963303
---
# INT8 albert-base-v2-sst2

##  Post-training static quantization

### PyTorch

This is an INT8  PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor). 

The original fp32 model comes from the fine-tuned model [Alireza1044/albert-base-v2-sst2](https://huggingface.co/Alireza1044/albert-base-v2-sst2).

The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so the real sampling size is 304.

The linear modules **albert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.module, albert.encoder.albert_layer_groups.0.albert_layers.0.ffn.module** fall back to fp32 to meet the 1% relative accuracy loss.

#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-accuracy)** |0.9255|0.9232|
| **Model size (MB)**  |25|44.6|

#### Load with Intel® Neural Compressor:

```python
from optimum.intel import INCModelForSequenceClassification

model_id = "Intel/albert-base-v2-sst2-int8-static"
int8_model = INCModelForSequenceClassification.from_pretrained(model_id)
```

### ONNX

This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

The original fp32 model comes from the fine-tuned model [Alireza1044/albert-base-v2-sst2](https://huggingface.co/Alireza1044/albert-base-v2-sst2).

The calibration dataloader is the eval dataloader. The calibration sampling size is 100.

#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-accuracy)** |0.9140|0.9232|
| **Model size (MB)**  |50|45|


#### Load ONNX model:

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained('Intel/albert-base-v2-sst2-int8-static')
```