Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

398 Bytes

	The smaller the stride, the more context the model will have in making each prediction,
	and the better the reported perplexity will typically be.
	When we run the above with stride = 1024, i.e. no overlap, the resulting PPL is 19.44, which is about the same
	as the 19.93 reported in the GPT-2 paper. By using stride = 512 and thereby employing our striding window
	strategy, this jumps down to 16.45.