Update README.md
#1
by
olatzEHU
- opened
README.md
CHANGED
@@ -11,7 +11,7 @@ metrics:
|
|
11 |
|
12 |
## Model description
|
13 |
|
14 |
-
This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
|
15 |
|
16 |
- **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
|
17 |
- **Model type:** traslation
|
@@ -67,7 +67,7 @@ The Spanish-Basque data collected from the web was a combination of the followin
|
|
67 |
| WikiMatrix | 154,281 |
|
68 |
| **Total** | ** 12,091,549 ** |
|
69 |
|
70 |
-
The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the
|
71 |
|
72 |
|
73 |
### Training Procedure
|
|
|
11 |
|
12 |
## Model description
|
13 |
|
14 |
+
This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating a random subset of [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
|
15 |
|
16 |
- **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
|
17 |
- **Model type:** traslation
|
|
|
67 |
| WikiMatrix | 154,281 |
|
68 |
| **Total** | ** 12,091,549 ** |
|
69 |
|
70 |
+
The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the Oscar Spanish monolingual dataset using a previous version (without synthetic parallel data) of the [ES-EU translator from the HiTZ center](https://huggingface.co/HiTZ/mt-hitz-es-eu).
|
71 |
|
72 |
|
73 |
### Training Procedure
|