To alleviate | |
that, axial positional encodings consist of factorizing that big matrix E in two smaller matrices E1 and E2, with | |
dimensions \(l_{1} \times d_{1}\) and \(l_{2} \times d_{2}\), such that \(l_{1} \times l_{2} = l\) and | |
\(d_{1} + d_{2} = d\) (with the product for the lengths, this ends up being way smaller). |