File size: 114 Bytes
5fa1a76
 
 
1
2
3

Attention mechanisms
Most transformer models use full attention in the sense that the attention matrix is square.