Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
331 Bytes
These methods are called before creating the quantized model to ensure users use the right configuration. You can have a look at how this is done on other quantizers.
Write the _process_model_before_weight_loading method. In Transformers, the quantized models are initialized first on the "meta" device before loading the weights.