HiTZ
/

mt-hitz-eu-es

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

Update README.md

#1

by olatzEHU - opened Jun 28, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ metrics:
 ## Model description
-This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
 - **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
 - **Model type:** traslation
@@ -67,7 +67,7 @@ The Spanish-Basque data collected from the web was a combination of the followin
 | WikiMatrix	           | 154,281                   |
 | **Total**     	       | ** 12,091,549 **             |
-The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the EusCrawl Basque monolingual dataset using a previous version (without synthetic parallel data) of the [ES-EU translator from the HiTZ center](https://huggingface.co/HiTZ/mt-hitz-es-eu).
 ### Training Procedure

 ## Model description
+This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating a random subset of [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
 - **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
 - **Model type:** traslation
 | WikiMatrix	           | 154,281                   |
 | **Total**     	       | ** 12,091,549 **             |
+The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the Oscar Spanish monolingual dataset using a previous version (without synthetic parallel data) of the [ES-EU translator from the HiTZ center](https://huggingface.co/HiTZ/mt-hitz-es-eu).
 ### Training Procedure