Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
610 Bytes
In this case, we prefer to only support inference in Transformers and let the third-party library maintained by the ML community deal with the model quantization itself.
Build a new HFQuantizer class
Create a new quantization config class inside src/transformers/utils/quantization_config.py and make sure to expose the new quantization config inside Transformers main init by adding it to the _import_structure object of src/transformers/init.py.
Create a new file inside src/transformers/quantizers/ named quantizer_your_method.py, and make it inherit from src/transformers/quantizers/base.py::HfQuantizer.