The smaller the stride, the more context the model will have in making each prediction, | |
and the better the reported perplexity will typically be. |
The smaller the stride, the more context the model will have in making each prediction, | |
and the better the reported perplexity will typically be. |