---
library_name: transformers
datasets:
- statmt/cc100
base_model:
- FacebookAI/xlm-roberta-base
---

# nomic-xlm-2048: XLM-Roberta Base with RoPE

`nomic-xlm-2048` is a finetuned XLM-Roberta Base model with learned positional embeddings swapped for RoPE and trained for 10k steps on [CC100](https://huggingface.co/datasets/statmt/cc100).


`nomic-xlm-2048` performs competitively to other multilingual encoders on GLUE and XTREME-R

| Model | Params | Pos. | Seq. | Avg. | CoLA | SST-2 | MRPC | STS-B | QQP | MNLI | QNLI | RTE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| XLM-R-Base | 279M | Abs. | 512 | 82.35 | 46.95 | 92.54 | 87.37 | 89.32 | 90.69 | 84.34 | 90.35 | 77.26 |
| nomic-xlm-2048 | 278M | RoPE | 2048 | 81.63 | 44.69 | 91.97 | 87.50 | 88.48 | 90.38 | 83.59 | 89.38 | 76.54 |
| mGTE-Base | 306M | RoPE | 8192 | 80.77 | 27.22 | 91.97 | 89.71 | 89.55 | 91.20 | 85.16 | 90.91 | 80.41 |


| Model | Avg. | XNLI | XCOPA | UDPOS | WikiANN | XQuAD | MLQA | TyDiQA-GoldP | Mewsli-X | LAReQA | Tatoeba |
|---|---|---|---|---|---|---|---|---|---|---|---|
| XLM-R-Base | 62.31 | 74.49 | 51.8 | 74.33 | 60.99 | 72.96 | 61.45 | 54.31 | 42.45 | 63.49 | 66.79 |
| nomic-xlm-2048 | 62.70 | 73.57 | 61.71 | 74.92 | 60.96 | 71.13 | 59.61 | 43.46 | 45.27 | 67.49 | 70.82 |
| mGTE-Base | 64.63 | 73.58 | 63.62 | 73.52 | 60.72 | 74.71 | 63.88 | 49.68 | 44.58 | 71.90 | 70.07 |


# Usage

```python
from transformers import AutoModelForMaskedLM, AutoConfig, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained('nomic-ai/nomic-xlm-2048') # `nomic-bert-2048` uses the standard BERT tokenizer

config = AutoConfig.from_pretrained('nomic-ai/nomic-xlm-2048', trust_remote_code=True) # the config needs to be passed in
model = AutoModelForMaskedLM.from_pretrained('nomic-ai/nomic-xlm-2048',config=config, trust_remote_code=True)

# To use this model directly for masked language modeling
classifier = pipeline('fill-mask', model=model, tokenizer=tokenizer,device="cpu")

print(classifier("I [MASK] to the store yesterday."))
```
To finetune the model for a Sequence Classification task, you can use the following snippet

```python
from transformers import AutoConfig, AutoModelForSequenceClassification
model_path = "nomic-ai/nomic-xlm-2048"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
# strict needs to be false here since we're initializing some new params
model = AutoModelForSequenceClassification.from_pretrained(model_path, config=config, trust_remote_code=True, strict=False)
```

# Join the Nomic Community

- Nomic: [https://nomic.ai](https://nomic.ai)
- Discord: [https://discord.gg/myY5YDR8z8](https://discord.gg/myY5YDR8z8)
- Twitter: [https://twitter.com/nomic_ai](https://twitter.com/nomic_ai)