Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Also, by stacking attention layers that have a small
window, the last layer will have a receptive field of more than just the tokens in the window, allowing them to build a
representation of the whole sentence.