Spaces:
Runtime error
Runtime error
| # Cross-lingual Retrieval for Iterative Self-Supervised Training | |
| https://arxiv.org/pdf/2006.09526.pdf | |
| ## Introduction | |
| CRISS is a multilingual sequence-to-sequnce pretraining method where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. | |
| ## Requirements: | |
| * faiss: https://github.com/facebookresearch/faiss | |
| * mosesdecoder: https://github.com/moses-smt/mosesdecoder | |
| * flores: https://github.com/facebookresearch/flores | |
| * LASER: https://github.com/facebookresearch/LASER | |
| ## Unsupervised Machine Translation | |
| ##### 1. Download and decompress CRISS checkpoints | |
| ``` | |
| cd examples/criss | |
| wget https://dl.fbaipublicfiles.com/criss/criss_3rd_checkpoints.tar.gz | |
| tar -xf criss_checkpoints.tar.gz | |
| ``` | |
| ##### 2. Download and preprocess Flores test dataset | |
| Make sure to run all scripts from examples/criss directory | |
| ``` | |
| bash download_and_preprocess_flores_test.sh | |
| ``` | |
| ##### 3. Run Evaluation on Sinhala-English | |
| ``` | |
| bash unsupervised_mt/eval.sh | |
| ``` | |
| ## Sentence Retrieval | |
| ##### 1. Download and preprocess Tatoeba dataset | |
| ``` | |
| bash download_and_preprocess_tatoeba.sh | |
| ``` | |
| ##### 2. Run Sentence Retrieval on Tatoeba Kazakh-English | |
| ``` | |
| bash sentence_retrieval/sentence_retrieval_tatoeba.sh | |
| ``` | |
| ## Mining | |
| ##### 1. Install faiss | |
| Follow instructions on https://github.com/facebookresearch/faiss/blob/master/INSTALL.md | |
| ##### 2. Mine pseudo-parallel data between Kazakh and English | |
| ``` | |
| bash mining/mine_example.sh | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{tran2020cross, | |
| title={Cross-lingual retrieval for iterative self-supervised training}, | |
| author={Tran, Chau and Tang, Yuqing and Li, Xian and Gu, Jiatao}, | |
| journal={arXiv preprint arXiv:2006.09526}, | |
| year={2020} | |
| } | |
| ``` | |