xgiga
/

mbart-neutralization

Text2Text Generation

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

xgiga commited on Feb 23, 2024

Commit

83d9e11

·

verified ·

1 Parent(s): 480c367

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -24,7 +24,9 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations

 ## Model description
+mBART-50 is a multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning on one direction, a pre-trained model is fine-tuned on many directions simultaneously. mBART-50 is created using the original mBART model and extended to add extra 25 languages to support multilingual machine translation models of 50 languages. The pre-training objective is explained below.
+Multilingual Denoising Pretraining: The model incorporates N languages by concatenating data: D = {D1, ..., DN } where each Di is a collection of monolingual documents in language i. The source documents are noised using two schemes, first randomly shuffling the original sentences' order, and second a novel in-filling scheme, where spans of text are replaced with a single mask token. The model is then tasked to reconstruct the original text. 35% of each instance's words are masked by random sampling a span length according to a Poisson distribution (λ = 3.5). The decoder input is the original text with one position offset. A language id symbol LID is used as the initial token to predict the sentence.
 ## Intended uses & limitations