Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
To alleviate
that, axial positional encodings consist of factorizing that big matrix E in two smaller matrices E1 and E2, with
dimensions \(l_{1} \times d_{1}\) and \(l_{2} \times d_{2}\), such that \(l_{1} \times l_{2} = l\) and
\(d_{1} + d_{2} = d\) (with the product for the lengths, this ends up being way smaller).