Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
288 Bytes
This is quick to compute since the perplexity of each segment can be computed in one forward pass, but serves as a poor
approximation of the fully-factorized perplexity and will typically yield a higher (worse) PPL because the model will
have less context at most of the prediction steps.