File size: 541 Bytes
57bdca5
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
For more
intuition about perplexity and its relationship to Bits Per Character (BPC) and data compression, check out this
fantastic blog post on The Gradient.
Calculating PPL with fixed-length models
If we weren't limited by a model's context size, we would evaluate the model's perplexity by autoregressively
factorizing a sequence and conditioning on the entire preceding subsequence at each step, as shown below.

When working with approximate models, however, we typically have a constraint on the number of tokens the model can
process.