Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
If we have a tokenized
sequence \(X = (x_0, x_1, \dots, x_t)\), then the perplexity of \(X\) is,
$$\text{PPL}(X) = \exp \left{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right}$$
where \(\log p_\theta (x_i|x_{<i})\) is the log-likelihood of the ith token conditioned on the preceding tokens \(x_{<i}\) according to our model.