We don't want the log-likelihood for the tokens we're just treating | |
as context to be included in our loss, so we can set these targets to -100 so that they are ignored. |
We don't want the log-likelihood for the tokens we're just treating | |
as context to be included in our loss, so we can set these targets to -100 so that they are ignored. |