Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
133 Bytes
In the softmax(QK^t), only the biggest elements (in the softmax
dimension) of the matrix QK^t are going to give useful contributions.