File size: 392 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 |
The method is wrapped in a nn.Module (e.g., Linear8bitLt, Linear4bit), and the quantized linear layer should have the following definition:
class Linear4bit(nn.Module):
def init(self, ):
def forward(self, x):
return my_4bit_kernel(x, self.weight, self.bias)
This way, Transformers models can be easily quantized by replacing some instances of nn.Linear with a target class. |