File size: 248 Bytes
5fa1a76
 
 
1
2
3
Before diving in, we should note
that the metric applies specifically to classical language models (sometimes called autoregressive or causal language
models) and is not well defined for masked language models like BERT (see summary of the models).