File size: 479 Bytes
57bdca5 |
1 2 3 4 5 6 7 8 9 10 |
Since this dataset is small and we're just doing one forward pass over the set, we can just load and encode the entire dataset in memory. thon from datasets import load_dataset test = load_dataset("wikitext", "wikitext-2-raw-v1", split="test") encodings = tokenizer("\n\n".join(test["text"]), return_tensors="pt") With 🤗 Transformers, we can simply pass the input_ids as the labels to our model, and the average negative log-likelihood for each token is returned as the loss. |