Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,7 +11,7 @@ metrics:
11
 
12
  ## Model description
13
 
14
- This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
15
 
16
  - **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
17
  - **Model type:** traslation
@@ -67,7 +67,7 @@ The Spanish-Basque data collected from the web was a combination of the followin
67
  | WikiMatrix | 154,281 |
68
  | **Total** | ** 12,091,549 ** |
69
 
70
- The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the EusCrawl Basque monolingual dataset using a previous version (without synthetic parallel data) of the [ES-EU translator from the HiTZ center](https://huggingface.co/HiTZ/mt-hitz-es-eu).
71
 
72
 
73
  ### Training Procedure
 
11
 
12
  ## Model description
13
 
14
+ This model was trained from scratch using [Marian NMT](https://marian-nmt.github.io/) on a combination of Spanish-Basque datasets totalling 104,417,271 sentence pairs. 12,091,549 sentence pairs were parallel data collected from the web while the remaining 92,325,722 sentence pairs were parallel synthetic data created backtranslating a random subset of [Oscar](https://oscar-project.org/) Spanish monolingual dataset. The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
15
 
16
  - **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
17
  - **Model type:** traslation
 
67
  | WikiMatrix | 154,281 |
68
  | **Total** | ** 12,091,549 ** |
69
 
70
+ The 92,325,722 sentence pairs of synthetic parallel data were created by backtranslating the Oscar Spanish monolingual dataset using a previous version (without synthetic parallel data) of the [ES-EU translator from the HiTZ center](https://huggingface.co/HiTZ/mt-hitz-es-eu).
71
 
72
 
73
  ### Training Procedure