If we have a tokenized | |
sequence \(X = (x_0, x_1, \dots, x_t)\), then the perplexity of \(X\) is, | |
$$\text{PPL}(X) = \exp \left{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right}$$ | |
where \(\log p_\theta (x_i|x_{<i})\) is the log-likelihood of the ith token conditioned on the preceding tokens \(x_{<i}\) according to our model. Intuitively, it can be thought of as an evaluation of the model's ability to predict uniformly among the set of specified tokens in a corpus. |