Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Other tricks
Axial positional encodings
Reformer uses axial positional encodings: in traditional transformer models, the positional encoding
E is a matrix of size \(l\) by \(d\), \(l\) being the sequence length and \(d\) the dimension of the
hidden state.