File size: 438 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels # to the left by 1. neg_log_likelihood = outputs.loss nlls.append(neg_log_likelihood) prev_end_loc = end_loc if end_loc == seq_len: break ppl = torch.exp(torch.stack(nlls).mean()) Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window strategy we discussed above. |