5fa1a76
1
2
The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them.