Iker commited on
Commit
1863710
·
unverified ·
2 Parent(s): 6cc3f7d 2cb7997

Merge pull request #1 from Kalebu/main

Browse files
Files changed (1) hide show
  1. README.md +47 -29
README.md CHANGED
@@ -15,35 +15,49 @@
15
 
16
  Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI. We also privide a [script](#evaluate-translations) for Easy-Evaluation of your translations 🥳
17
 
 
18
 
19
- M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It was introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository. The model that can directly translate between the 9,900 directions of 100 languages.
 
 
 
 
 
 
 
 
20
 
21
- Easy-Translate is built on top of 🤗HuggingFace's [Transformers](https://huggingface.co/docs/transformers/index) and 🤗HuggingFace's [Accelerate](https://huggingface.co/docs/accelerate/index) library. We support:
22
- * CPU / multi-CPU / GPU / multi-GPU / TPU acceleration
23
- * BF16 / FP16 / FP32 precision.
24
- * Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
25
- * Sharded Data Parallel to load huge models sharded on multiple GPUs (See: https://huggingface.co/docs/accelerate/fsdp).
26
 
27
- Test the 🔌 Online Demo here: https://huggingface.co/spaces/Iker/Translate-100-languages
 
 
 
 
 
 
 
 
 
28
 
29
  ## Supported languages
 
30
  See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
31
 
32
- **List of supported languages:**
33
  Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
34
 
35
  ## Supported Models
36
 
37
- * **Facebook/m2m100_418M**: https://huggingface.co/facebook/m2m100_418M
38
 
39
- * **Facebook/m2m100_1.2B**: https://huggingface.co/facebook/m2m100_1.2B
40
 
41
- * **Facebook/m2m100_12B**: https://huggingface.co/facebook/m2m100-12B-avg-5-ckpt
42
-
43
- * Any other m2m100 model from HuggingFace's Hub: https://huggingface.co/models?search=m2m100
44
 
 
45
 
46
- ## Requirements:
47
 
48
  ```
49
  Pytorch >= 1.10.0
@@ -58,9 +72,10 @@ pip install --upgrade transformers
58
 
59
  ## Translate a file
60
 
61
- Run `python translate.py -h` for more info.
 
 
62
 
63
- #### Using a single CPU / GPU:
64
  ```bash
65
  accelerate launch translate.py \
66
  --sentences_path sample_text/en.txt \
@@ -70,8 +85,9 @@ accelerate launch translate.py \
70
  --model_name facebook/m2m100_1.2B
71
  ```
72
 
73
- #### Multi-GPU:
74
- See Accelerate documentation for more information (multi-node, TPU, Sharded model...): https://huggingface.co/docs/accelerate/index
 
75
  You can use the Accelerate CLI to configure the Accelerate environment (Run `accelerate config` in your terminal) instead of using the `--multi_gpu and --num_processes` flags.
76
 
77
  ```bash
@@ -83,13 +99,13 @@ accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
83
  --model_name facebook/m2m100_1.2B
84
  ```
85
 
86
- #### Automatic batch size finder:
87
- We will automatically find a batch size that fits in your GPU memory. The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag). If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
88
 
 
89
 
 
90
 
91
- #### Choose precision:
92
- Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
93
 
94
  ```bash
95
  accelerate launch translate.py \
@@ -106,12 +122,13 @@ accelerate launch translate.py \
106
  To run the evaluation script you need to install [bert_score](https://github.com/Tiiiger/bert_score): `pip install bert_score` and 🤗HuggingFace's [Datasets](https://huggingface.co/docs/datasets/index) model: `pip install datasets`.
107
 
108
  The evaluation script will calculate the following metrics:
109
- * [SacreBLEU](https://github.com/huggingface/datasets/tree/master/metrics/sacrebleu)
110
- * [BLEU](https://github.com/huggingface/datasets/tree/master/metrics/bleu)
111
- * [ROUGE](https://github.com/huggingface/datasets/tree/master/metrics/rouge)
112
- * [METEOR](https://github.com/huggingface/datasets/tree/master/metrics/meteor)
113
- * [TER](https://github.com/huggingface/datasets/tree/master/metrics/ter)
114
- * [BertScore](https://github.com/huggingface/datasets/tree/master/metrics/bertscore)
 
115
 
116
  Run the following command to evaluate the translations:
117
 
@@ -123,4 +140,5 @@ accelerate launch eval.py \
123
 
124
  If you want to save the results to a file use the `--output_path` flag.
125
 
126
- See [sample_text/en2es.m2m100_1.2B.json](sample_text/en2es.m2m100_1.2B.json) for a sample output.
 
 
15
 
16
  Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI. We also privide a [script](#evaluate-translations) for Easy-Evaluation of your translations 🥳
17
 
18
+ **M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
19
 
20
+ - [Supported languages](#supported-languages)
21
+ - [Supported models](#supported-models)
22
+ - [Requirements](#requirements)
23
+ - [Translating a file](#translate-a-file)
24
+ - [Using single CPU/GPU](#using-a-single-cpu-gpu)
25
+ - [Multi-GPU](#multi-gpu)
26
+ - [Automatic Batch Size Finder](#automatic-batch-size-finder)
27
+ - [Choose precision](#choose-precision)
28
+ - [Evaluate translations](#evaluate-translations)
29
 
30
+ >The model that can directly translate between the 9,900 directions of 100 languages.
 
 
 
 
31
 
32
+ Easy-Translate is built on top of 🤗HuggingFace's [Transformers](https://huggingface.co/docs/transformers/index) and 🤗HuggingFace's[Accelerate](https://huggingface.co/docs/accelerate/index) library.
33
+
34
+ We currently support:
35
+
36
+ - CPU / GPU / multi-GPU / TPU acceleration
37
+ - BF16 / FP16 / FB32 precision.
38
+ - Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
39
+ - Sharded Data Parallel to load huge models sharded on multiple GPUs (See: <https://huggingface.co/docs/accelerate/fsdp>).
40
+
41
+ >Test the 🔌 Online Demo here: <https://huggingface.co/spaces/Iker/Translate-100-languages>
42
 
43
  ## Supported languages
44
+
45
  See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
46
 
47
+ **List of supported languages:**
48
  Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
49
 
50
  ## Supported Models
51
 
52
+ - **Facebook/m2m100_418M**: <https://huggingface.co/facebook/m2m100_418M>
53
 
54
+ - **Facebook/m2m100_1.2B**: <https://huggingface.co/facebook/m2m100_1.2B>
55
 
56
+ - **Facebook/m2m100_12B**: <https://huggingface.co/facebook/m2m100-12B-avg-5-ckpt>
 
 
57
 
58
+ - Any other m2m100 model from HuggingFace's Hub: <https://huggingface.co/models?search=m2m100>
59
 
60
+ ## Requirements
61
 
62
  ```
63
  Pytorch >= 1.10.0
 
72
 
73
  ## Translate a file
74
 
75
+ Run `python translate.py -h` for more info.
76
+
77
+ #### Using a single CPU / GPU
78
 
 
79
  ```bash
80
  accelerate launch translate.py \
81
  --sentences_path sample_text/en.txt \
 
85
  --model_name facebook/m2m100_1.2B
86
  ```
87
 
88
+ #### Multi-GPU
89
+
90
+ See Accelerate documentation for more information (multi-node, TPU, Sharded model...): <https://huggingface.co/docs/accelerate/index>
91
  You can use the Accelerate CLI to configure the Accelerate environment (Run `accelerate config` in your terminal) instead of using the `--multi_gpu and --num_processes` flags.
92
 
93
  ```bash
 
99
  --model_name facebook/m2m100_1.2B
100
  ```
101
 
102
+ #### Automatic batch size finder
 
103
 
104
+ We will automatically find a batch size that fits in your GPU memory. The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag). If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
105
 
106
+ #### Choose precision
107
 
108
+ Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
 
109
 
110
  ```bash
111
  accelerate launch translate.py \
 
122
  To run the evaluation script you need to install [bert_score](https://github.com/Tiiiger/bert_score): `pip install bert_score` and 🤗HuggingFace's [Datasets](https://huggingface.co/docs/datasets/index) model: `pip install datasets`.
123
 
124
  The evaluation script will calculate the following metrics:
125
+
126
+ - [SacreBLEU](https://github.com/huggingface/datasets/tree/master/metrics/sacrebleu)
127
+ - [BLEU](https://github.com/huggingface/datasets/tree/master/metrics/bleu)
128
+ - [ROUGE](https://github.com/huggingface/datasets/tree/master/metrics/rouge)
129
+ - [METEOR](https://github.com/huggingface/datasets/tree/master/metrics/meteor)
130
+ - [TER](https://github.com/huggingface/datasets/tree/master/metrics/ter)
131
+ - [BertScore](https://github.com/huggingface/datasets/tree/master/metrics/bertscore)
132
 
133
  Run the following command to evaluate the translations:
134
 
 
140
 
141
  If you want to save the results to a file use the `--output_path` flag.
142
 
143
+ See [sample_text/en2es.m2m100_1.2B.json](sample_text/en2es.m2m100_1.2B.json) for a sample output.
144
+