File size: 657 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
In this example, load the FacebookAI/xlm-clm-enfr-1024 checkpoint (Causal language modeling, English-French): import torch from transformers import XLMTokenizer, XLMWithLMHeadModel tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024") model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024") The lang2id attribute of the tokenizer displays this model's languages and their ids: print(tokenizer.lang2id) {'en': 0, 'fr': 1} Next, create an example input: input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1 Set the language id as "en" and use it to define the language embedding. |