Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The smaller the stride, the more context the model will have in making each prediction,
and the better the reported perplexity will typically be.