Spaces:
Running
Running
Merge pull request #1 from Kalebu/main
Browse files
README.md
CHANGED
@@ -15,35 +15,49 @@
|
|
15 |
|
16 |
Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI. We also privide a [script](#evaluate-translations) for Easy-Evaluation of your translations 🥳
|
17 |
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
|
22 |
-
* CPU / multi-CPU / GPU / multi-GPU / TPU acceleration
|
23 |
-
* BF16 / FP16 / FP32 precision.
|
24 |
-
* Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
|
25 |
-
* Sharded Data Parallel to load huge models sharded on multiple GPUs (See: https://huggingface.co/docs/accelerate/fsdp).
|
26 |
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## Supported languages
|
|
|
30 |
See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
|
31 |
|
32 |
-
**List of supported languages:**
|
33 |
Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
|
34 |
|
35 |
## Supported Models
|
36 |
|
37 |
-
|
38 |
|
39 |
-
|
40 |
|
41 |
-
|
42 |
-
|
43 |
-
* Any other m2m100 model from HuggingFace's Hub: https://huggingface.co/models?search=m2m100
|
44 |
|
|
|
45 |
|
46 |
-
## Requirements
|
47 |
|
48 |
```
|
49 |
Pytorch >= 1.10.0
|
@@ -58,9 +72,10 @@ pip install --upgrade transformers
|
|
58 |
|
59 |
## Translate a file
|
60 |
|
61 |
-
Run `python translate.py -h` for more info.
|
|
|
|
|
62 |
|
63 |
-
#### Using a single CPU / GPU:
|
64 |
```bash
|
65 |
accelerate launch translate.py \
|
66 |
--sentences_path sample_text/en.txt \
|
@@ -70,8 +85,9 @@ accelerate launch translate.py \
|
|
70 |
--model_name facebook/m2m100_1.2B
|
71 |
```
|
72 |
|
73 |
-
#### Multi-GPU
|
74 |
-
|
|
|
75 |
You can use the Accelerate CLI to configure the Accelerate environment (Run `accelerate config` in your terminal) instead of using the `--multi_gpu and --num_processes` flags.
|
76 |
|
77 |
```bash
|
@@ -83,13 +99,13 @@ accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
|
|
83 |
--model_name facebook/m2m100_1.2B
|
84 |
```
|
85 |
|
86 |
-
#### Automatic batch size finder
|
87 |
-
We will automatically find a batch size that fits in your GPU memory. The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag). If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
|
88 |
|
|
|
89 |
|
|
|
90 |
|
91 |
-
|
92 |
-
Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
|
93 |
|
94 |
```bash
|
95 |
accelerate launch translate.py \
|
@@ -106,12 +122,13 @@ accelerate launch translate.py \
|
|
106 |
To run the evaluation script you need to install [bert_score](https://github.com/Tiiiger/bert_score): `pip install bert_score` and 🤗HuggingFace's [Datasets](https://huggingface.co/docs/datasets/index) model: `pip install datasets`.
|
107 |
|
108 |
The evaluation script will calculate the following metrics:
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
|
|
115 |
|
116 |
Run the following command to evaluate the translations:
|
117 |
|
@@ -123,4 +140,5 @@ accelerate launch eval.py \
|
|
123 |
|
124 |
If you want to save the results to a file use the `--output_path` flag.
|
125 |
|
126 |
-
See [sample_text/en2es.m2m100_1.2B.json](sample_text/en2es.m2m100_1.2B.json) for a sample output.
|
|
|
|
15 |
|
16 |
Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI. We also privide a [script](#evaluate-translations) for Easy-Evaluation of your translations 🥳
|
17 |
|
18 |
+
**M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
|
19 |
|
20 |
+
- [Supported languages](#supported-languages)
|
21 |
+
- [Supported models](#supported-models)
|
22 |
+
- [Requirements](#requirements)
|
23 |
+
- [Translating a file](#translate-a-file)
|
24 |
+
- [Using single CPU/GPU](#using-a-single-cpu-gpu)
|
25 |
+
- [Multi-GPU](#multi-gpu)
|
26 |
+
- [Automatic Batch Size Finder](#automatic-batch-size-finder)
|
27 |
+
- [Choose precision](#choose-precision)
|
28 |
+
- [Evaluate translations](#evaluate-translations)
|
29 |
|
30 |
+
>The model that can directly translate between the 9,900 directions of 100 languages.
|
|
|
|
|
|
|
|
|
31 |
|
32 |
+
Easy-Translate is built on top of 🤗HuggingFace's [Transformers](https://huggingface.co/docs/transformers/index) and 🤗HuggingFace's[Accelerate](https://huggingface.co/docs/accelerate/index) library.
|
33 |
+
|
34 |
+
We currently support:
|
35 |
+
|
36 |
+
- CPU / GPU / multi-GPU / TPU acceleration
|
37 |
+
- BF16 / FP16 / FB32 precision.
|
38 |
+
- Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
|
39 |
+
- Sharded Data Parallel to load huge models sharded on multiple GPUs (See: <https://huggingface.co/docs/accelerate/fsdp>).
|
40 |
+
|
41 |
+
>Test the 🔌 Online Demo here: <https://huggingface.co/spaces/Iker/Translate-100-languages>
|
42 |
|
43 |
## Supported languages
|
44 |
+
|
45 |
See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
|
46 |
|
47 |
+
**List of supported languages:**
|
48 |
Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
|
49 |
|
50 |
## Supported Models
|
51 |
|
52 |
+
- **Facebook/m2m100_418M**: <https://huggingface.co/facebook/m2m100_418M>
|
53 |
|
54 |
+
- **Facebook/m2m100_1.2B**: <https://huggingface.co/facebook/m2m100_1.2B>
|
55 |
|
56 |
+
- **Facebook/m2m100_12B**: <https://huggingface.co/facebook/m2m100-12B-avg-5-ckpt>
|
|
|
|
|
57 |
|
58 |
+
- Any other m2m100 model from HuggingFace's Hub: <https://huggingface.co/models?search=m2m100>
|
59 |
|
60 |
+
## Requirements
|
61 |
|
62 |
```
|
63 |
Pytorch >= 1.10.0
|
|
|
72 |
|
73 |
## Translate a file
|
74 |
|
75 |
+
Run `python translate.py -h` for more info.
|
76 |
+
|
77 |
+
#### Using a single CPU / GPU
|
78 |
|
|
|
79 |
```bash
|
80 |
accelerate launch translate.py \
|
81 |
--sentences_path sample_text/en.txt \
|
|
|
85 |
--model_name facebook/m2m100_1.2B
|
86 |
```
|
87 |
|
88 |
+
#### Multi-GPU
|
89 |
+
|
90 |
+
See Accelerate documentation for more information (multi-node, TPU, Sharded model...): <https://huggingface.co/docs/accelerate/index>
|
91 |
You can use the Accelerate CLI to configure the Accelerate environment (Run `accelerate config` in your terminal) instead of using the `--multi_gpu and --num_processes` flags.
|
92 |
|
93 |
```bash
|
|
|
99 |
--model_name facebook/m2m100_1.2B
|
100 |
```
|
101 |
|
102 |
+
#### Automatic batch size finder
|
|
|
103 |
|
104 |
+
We will automatically find a batch size that fits in your GPU memory. The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag). If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
|
105 |
|
106 |
+
#### Choose precision
|
107 |
|
108 |
+
Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
|
|
|
109 |
|
110 |
```bash
|
111 |
accelerate launch translate.py \
|
|
|
122 |
To run the evaluation script you need to install [bert_score](https://github.com/Tiiiger/bert_score): `pip install bert_score` and 🤗HuggingFace's [Datasets](https://huggingface.co/docs/datasets/index) model: `pip install datasets`.
|
123 |
|
124 |
The evaluation script will calculate the following metrics:
|
125 |
+
|
126 |
+
- [SacreBLEU](https://github.com/huggingface/datasets/tree/master/metrics/sacrebleu)
|
127 |
+
- [BLEU](https://github.com/huggingface/datasets/tree/master/metrics/bleu)
|
128 |
+
- [ROUGE](https://github.com/huggingface/datasets/tree/master/metrics/rouge)
|
129 |
+
- [METEOR](https://github.com/huggingface/datasets/tree/master/metrics/meteor)
|
130 |
+
- [TER](https://github.com/huggingface/datasets/tree/master/metrics/ter)
|
131 |
+
- [BertScore](https://github.com/huggingface/datasets/tree/master/metrics/bertscore)
|
132 |
|
133 |
Run the following command to evaluate the translations:
|
134 |
|
|
|
140 |
|
141 |
If you want to save the results to a file use the `--output_path` flag.
|
142 |
|
143 |
+
See [sample_text/en2es.m2m100_1.2B.json](sample_text/en2es.m2m100_1.2B.json) for a sample output.
|
144 |
+
|