Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,9 @@ It achieves the following results on the evaluation set:
|
|
24 |
|
25 |
## Model description
|
26 |
|
27 |
-
|
|
|
|
|
28 |
|
29 |
## Intended uses & limitations
|
30 |
|
|
|
24 |
|
25 |
## Model description
|
26 |
|
27 |
+
mBART-50 is a multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning on one direction, a pre-trained model is fine-tuned on many directions simultaneously. mBART-50 is created using the original mBART model and extended to add extra 25 languages to support multilingual machine translation models of 50 languages. The pre-training objective is explained below.
|
28 |
+
|
29 |
+
Multilingual Denoising Pretraining: The model incorporates N languages by concatenating data: D = {D1, ..., DN } where each Di is a collection of monolingual documents in language i. The source documents are noised using two schemes, first randomly shuffling the original sentences' order, and second a novel in-filling scheme, where spans of text are replaced with a single mask token. The model is then tasked to reconstruct the original text. 35% of each instance's words are masked by random sampling a span length according to a Poisson distribution (位 = 3.5). The decoder input is the original text with one position offset. A language id symbol LID is used as the initial token to predict the sentence.
|
30 |
|
31 |
## Intended uses & limitations
|
32 |
|