Commit
·
0a36376
1
Parent(s):
ba4cd1f
Update app.py
Browse files
app.py
CHANGED
@@ -7,7 +7,7 @@ Nahuatl is the most widely spoken indigenous language in Mexico. However, traini
|
|
7 |
|
8 |
## Motivation
|
9 |
|
10 |
-
One of the Sustainable Development Goals is ["Reduced Inequalities"](https://www.un.org/sustainabledevelopment/inequality/). We know for sure that language is one of the most powerful tools we have and a way to distribute knowledge and experience. But most of the progress that has been done among important topics like technology, education, human rights and law, news and so on, is biased due to lack of resources in different languages. We expect this approach to become an important platform for others in order to reduce inequality and get all Nahuatl speakers closer to what they need to thrive and why not, share with us their valuable knowledge, costumes and way of living.
|
11 |
|
12 |
|
13 |
## Model description
|
@@ -63,7 +63,7 @@ In training stage 1 we first introduce Spanish to the model. The objective is to
|
|
63 |
We use the pretrained Spanish-English model to learn Spanish-Nahuatl. Since the amount of Nahuatl pairs is limited, we also add to our dataset 20,000 samples from the English-Spanish Anki dataset. This two-task-training avoids overfitting end makes the model more robust.
|
64 |
|
65 |
### Training setup
|
66 |
-
We train the models on the same datasets for 660k steps using batch size = 16 and
|
67 |
|
68 |
|
69 |
## Evaluation results
|
@@ -74,7 +74,7 @@ For a fair comparison, the models are evaluated on the same 505 validation Nahu
|
|
74 |
| False | 1.34 | 6.17 | 26.96 |
|
75 |
| True | 1.31 | 6.18 | 28.21 |
|
76 |
|
77 |
-
The English-Spanish pretraining improves BLEU and Chrf, and leads to faster convergence.
|
78 |
|
79 |
# Team members
|
80 |
- Emilio Alejandro Morales [(milmor)](https://huggingface.co/milmor)
|
|
|
7 |
|
8 |
## Motivation
|
9 |
|
10 |
+
One of the United Nations Sustainable Development Goals is ["Reduced Inequalities"](https://www.un.org/sustainabledevelopment/inequality/). We know for sure that language is one of the most powerful tools we have and a way to distribute knowledge and experience. But most of the progress that has been done among important topics like technology, education, human rights and law, news and so on, is biased due to lack of resources in different languages. We expect this approach to become an important platform for others in order to reduce inequality and get all Nahuatl speakers closer to what they need to thrive and why not, share with us their valuable knowledge, costumes and way of living.
|
11 |
|
12 |
|
13 |
## Model description
|
|
|
63 |
We use the pretrained Spanish-English model to learn Spanish-Nahuatl. Since the amount of Nahuatl pairs is limited, we also add to our dataset 20,000 samples from the English-Spanish Anki dataset. This two-task-training avoids overfitting end makes the model more robust.
|
64 |
|
65 |
### Training setup
|
66 |
+
We train the models on the same datasets for 660k steps using batch size = 16 and a learning rate of 2e-5.
|
67 |
|
68 |
|
69 |
## Evaluation results
|
|
|
74 |
| False | 1.34 | 6.17 | 26.96 |
|
75 |
| True | 1.31 | 6.18 | 28.21 |
|
76 |
|
77 |
+
The English-Spanish pretraining improves BLEU and Chrf, and leads to faster convergence. You can reproduce the evaluation on the [eval.ipynb](https://github.com/milmor/spanish-nahuatl-translation/blob/main/eval.ipynb) notebook.
|
78 |
|
79 |
# Team members
|
80 |
- Emilio Alejandro Morales [(milmor)](https://huggingface.co/milmor)
|