Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
480 Bytes
This involves repeatedly
sliding the context window so that the model has more context when making each prediction.
This is a closer approximation to the true decomposition of the sequence probability and will typically yield a more
favorable score. The downside is that it requires a separate forward pass for each token in the corpus. A good
practical compromise is to employ a strided sliding window, moving the context by larger strides rather than sliding by
1 token a time.