File size: 2,267 Bytes
de2d940 377cd9d de2d940 377cd9d c9fae9f aed2b7c 282f146 2c51166 377cd9d 67fb153 377cd9d de2d940 377cd9d e78fff1 377cd9d e78fff1 377cd9d e78fff1 377cd9d e620322 bc618dd e620322 377cd9d e78fff1 19ad58c e78fff1 19ad58c e78fff1 e620322 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
language:
- en
license: apache-2.0
tags:
- text-classfication
- int8
- Intel® Neural Compressor
- neural-compressor
- PostTrainingStatic
datasets:
- nyu-mll/glue
metrics:
- accuracy
model_index:
- name: sst2
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: GLUE SST2
type: glue
args: sst2
metric:
name: Accuracy
type: accuracy
value: 0.9254587155963303
---
# INT8 albert-base-v2-sst2
## Post-training static quantization
### PyTorch
This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
The original fp32 model comes from the fine-tuned model [Alireza1044/albert-base-v2-sst2](https://huggingface.co/Alireza1044/albert-base-v2-sst2).
The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so the real sampling size is 304.
The linear modules **albert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.module, albert.encoder.albert_layer_groups.0.albert_layers.0.ffn.module** fall back to fp32 to meet the 1% relative accuracy loss.
#### Test result
| |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-accuracy)** |0.9255|0.9232|
| **Model size (MB)** |25|44.6|
#### Load with Intel® Neural Compressor:
```python
from optimum.intel import INCModelForSequenceClassification
model_id = "Intel/albert-base-v2-sst2-int8-static"
int8_model = INCModelForSequenceClassification.from_pretrained(model_id)
```
### ONNX
This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
The original fp32 model comes from the fine-tuned model [Alireza1044/albert-base-v2-sst2](https://huggingface.co/Alireza1044/albert-base-v2-sst2).
The calibration dataloader is the eval dataloader. The calibration sampling size is 100.
#### Test result
| |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-accuracy)** |0.9140|0.9232|
| **Model size (MB)** |50|45|
#### Load ONNX model:
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained('Intel/albert-base-v2-sst2-int8-static')
```
|