Spaces:
Running
Running
File size: 5,435 Bytes
d0a815c b873cb9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
<p align="center">
<br>
<img src="images/title.png" width="900"/>
<br>
<a href="https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2Fikergarcia1996%2FEasy-Translate"><img alt="Twitter" src="https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Fgithub.com%2Fikergarcia1996%2FEasy-Translate"></a>
<a href="https://github.com/ikergarcia1996/Easy-Translate/blob/main/LICENSE.md"><img alt="License" src="https://img.shields.io/github/license/ikergarcia1996/Easy-Translate"></a>
<a href="https://huggingface.co/docs/transformers/index"><img alt="Transformers" src="https://img.shields.io/badge/-%F0%9F%A4%97Transformers%20-grey"></a>
<a href="https://huggingface.co/docs/accelerate/index/"><img alt="Accelerate" src="https://img.shields.io/badge/-%F0%9F%A4%97Accelerate%20-grey"></a>
<a href="https://ikergarcia1996.github.io/Iker-Garcia-Ferrero/"><img alt="Author" src="https://img.shields.io/badge/Author-Iker García Ferrero-ff69b4"></a>
<br>
<br>
</p>
Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI.
M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation.
It was introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
The model that can directly translate between the 9,900 directions of 100 languages.
Easy-Translate is built on top of 🤗HuggingFace's
[Transformers](https://huggingface.co/docs/transformers/index) and
🤗HuggingFace's [Accelerate](https://huggingface.co/docs/accelerate/index) library. We support:
* CPU / GPU / multi-GPU / TPU acceleration
* BF16 / FP16 / FB32 precision.
* Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
* Sharded Data Parallel to load huge models sharded on multiple GPUs (See: https://huggingface.co/docs/accelerate/fsdp).
Test the 🔌 Online Demo here: https://huggingface.co/spaces/Iker/Translate-100-languages
## Supported languages
See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
**List of supported languages:**
Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
## Supported Models
* **Facebook/m2m100_418M**: https://huggingface.co/facebook/m2m100_418M
* **Facebook/m2m100_1.2B**: https://huggingface.co/facebook/m2m100_1.2B
* **Facebook/m2m100_12B**: https://huggingface.co/facebook/m2m100-12B-avg-5-ckpt
* Any other m2m100 model from HuggingFace's Hub: https://huggingface.co/models?search=m2m100
## Requirements:
```
Pytorch >= 1.10.0
See: https://pytorch.org/get-started/locally/
Accelerate >= 0.7.1
pip install --upgrade accelerate
HuggingFace Transformers
pip install --upgrade transformers
```
## Translate a file
Run `python translate.py -h` for more info.
#### Using a single CPU / GPU:
```bash
accelerate launch translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.txt \
--source_lang en \
--target_lang es \
--model_name facebook/m2m100_1.2B
```
#### Multi-GPU:
See Accelerate documentation for more information (multi-node, TPU, Sharded model...): https://huggingface.co/docs/accelerate/index
You can use the Accelerate CLI to configure the Accelerate environment (Run
`accelerate config` in your terminal) instead of using the
`--multi_gpu and --num_processes` flags.
```bash
accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.txt \
--source_lang en \
--target_lang es \
--model_name facebook/m2m100_1.2B
```
#### Automatic batch size finder:
We will automatically find a batch size that fits in your GPU memory.
The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag).
If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
#### Choose precision:
Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
```bash
accelerate launch translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.txt \
--source_lang en \
--target_lang es \
--model_name facebook/m2m100_1.2B \
--precision fp16
```
## Evaluate translations
Work in progress...
|