Errors with quantized model

#8
by tatyanavidrevich - opened

I am using the following quantization method:

from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True

)
model = AutoModelForVision2Seq.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview", quantization_config=bnb_config)

During generation, I get an error:
/usr/local/lib/python3.11/dist-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights, is_causal)
6249 attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
6250 )
-> 6251 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
6252 attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))
6253

RuntimeError: self and mat2 must have the same dtype, but got Half and Byte

It works fine w/o quantization, however quantization is useful during fine-tuning, could you please suggest how to make it work?

Thank you

Sign up or log in to comment