Spaces:
Runtime error
Runtime error
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# NLLB-MOE | |
## Overview | |
The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi, | |
Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, | |
Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, | |
Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, | |
Safiyyah Saleem, Holger Schwenk, and Jeff Wang. | |
The abstract of the paper is the following: | |
*Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. | |
However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the | |
200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by | |
first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed | |
at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of | |
Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training | |
improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using | |
a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. | |
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system.* | |
Tips: | |
- M2M100ForConditionalGeneration is the base model for both NLLB and NLLB MoE | |
- The NLLB-MoE is very similar to the NLLB model, but it's feed forward layer is based on the implementation of SwitchTransformers. | |
- The tokenizer is the same as the NLLB models. | |
This model was contributed by [Arthur Zucker](https://huggingface.co/ArtZucker). | |
The original code can be found [here](https://github.com/facebookresearch/fairseq). | |
## Implementation differences with SwitchTransformers | |
The biggest difference is the way the tokens are routed. NLLB-MoE uses a `top-2-gate` which means that blah blah blah blah. | |
In SwitchTransformers, once the masks are computed for each experts, we just index the current hidden_states with the routing mask, and feed the | |
correct tokens to the expert. However here, the implementation varies a lot as the fairseq repository used a different approach. | |
## Generating with NLLB-MoE | |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine. | |
While generating the target text set the `forced_bos_token_id` to the target language id. The following | |
example shows how to translate English to French using the *facebook/nllb-200-distilled-600M* model. | |
Note that we're using the BCP-47 code for French `fra_Latn`. See [here](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) | |
for the list of all BCP-47 in the Flores 200 dataset. | |
```python | |
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-moe-54b") | |
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-moe-54b") | |
>>> article = "Previously, Ring's CEO, Jamie Siminoff, remarked the company started when his doorbell wasn't audible from his shop in his garage." | |
>>> inputs = tokenizer(article, return_tensors="pt") | |
>>> translated_tokens = model.generate( | |
... **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["fra_Latn"], max_length=50 | |
... ) | |
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] | |
"Auparavant, le PDG de Ring, Jamie Siminoff, a fait remarquer que la société avait commencé lorsque sa sonnette n'était pas audible depuis son magasin dans son garage." | |
``` | |
### Generating from any other language than English | |
English (`eng_Latn`) is set as the default language from which to translate. In order to specify that you'd like to translate from a different language, | |
you should specify the BCP-47 code in the `src_lang` keyword argument of the tokenizer initialization. | |
See example below for a translation from romanian to german: | |
```python | |
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-moe-54b", src_lang="ron_Latn") | |
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-moe-54b") | |
>>> article = "Şeful ONU spune că nu există o soluţie militară în Siria" | |
>>> inputs = tokenizer(article, return_tensors="pt") | |
>>> translated_tokens = model.generate( | |
... **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30 | |
... ) | |
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] | |
``` | |
## Documentation resources | |
- [Translation task guide](./tasks/translation) | |
- [Summarization task guide](./tasks/summarization) | |
## NllbMoeConfig | |
[[autodoc]] NllbMoeConfig | |
## NllbMoeTop2Router | |
[[autodoc]] NllbMoeTop2Router | |
- route_tokens | |
- forward | |
## NllbMoeSparseMLP | |
[[autodoc]] NllbMoeSparseMLP | |
- forward | |
## NllbMoeModel | |
[[autodoc]] NllbMoeModel | |
- forward | |
## NllbMoeForConditionalGeneration | |
[[autodoc]] NllbMoeForConditionalGeneration | |
- forward | |