Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 398 Bytes

57bdca5

The smaller the stride, the more context the model will have in making each prediction,
and the better the reported perplexity will typically be.
When we run the above with stride = 1024, i.e. no overlap, the resulting PPL is 19.44, which is about the same
as the 19.93 reported in the GPT-2 paper. By using stride = 512 and thereby employing our striding window
strategy, this jumps down to 16.45.