Spaces:
Sleeping
Sleeping
File size: 4,034 Bytes
6fc683c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# Scaling Neural Machine Translation (Ott et al., 2018)
This page includes instructions for reproducing results from the paper [Scaling Neural Machine Translation (Ott et al., 2018)](https://arxiv.org/abs/1806.00187).
## Pre-trained models
Model | Description | Dataset | Download
---|---|---|---
`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
## Training a new model on WMT'16 En-De
First download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).
Then:
##### 1. Extract the WMT'16 En-De data
```bash
TEXT=wmt16_en_de_bpe32k
mkdir -p $TEXT
tar -xzvf wmt16_en_de.tar.gz -C $TEXT
```
##### 2. Preprocess the dataset with a joined dictionary
```bash
fairseq-preprocess \
--source-lang en --target-lang de \
--trainpref $TEXT/train.tok.clean.bpe.32000 \
--validpref $TEXT/newstest2013.tok.bpe.32000 \
--testpref $TEXT/newstest2014.tok.bpe.32000 \
--destdir data-bin/wmt16_en_de_bpe32k \
--nwordssrc 32768 --nwordstgt 32768 \
--joined-dictionary \
--workers 20
```
##### 3. Train a model
```bash
fairseq-train \
data-bin/wmt16_en_de_bpe32k \
--arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
--lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
--dropout 0.3 --weight-decay 0.0 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--max-tokens 3584 \
--fp16
```
Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.
***IMPORTANT:*** You will get better performance by training with big batches and
increasing the learning rate. If you want to train the above model with big batches
(assuming your machine has 8 GPUs):
- add `--update-freq 16` to simulate training on 8x16=128 GPUs
- increase the learning rate; 0.001 works well for big batches
##### 4. Evaluate
Now we can evaluate our trained model.
Note that the original [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
paper used a couple tricks to achieve better BLEU scores. We use these same tricks in
the Scaling NMT paper, so it's important to apply them when reproducing our results.
First, use the [average_checkpoints.py](/scripts/average_checkpoints.py) script to
average the last few checkpoints. Averaging the last 5-10 checkpoints is usually
good, but you may need to adjust this depending on how long you've trained:
```bash
python scripts/average_checkpoints \
--inputs /path/to/checkpoints \
--num-epoch-checkpoints 5 \
--output checkpoint.avg5.pt
```
Next, generate translations using a beam width of 4 and length penalty of 0.6:
```bash
fairseq-generate \
data-bin/wmt16_en_de_bpe32k \
--path checkpoint.avg5.pt \
--beam 4 --lenpen 0.6 --remove-bpe
```
## Citation
```bibtex
@inproceedings{ott2018scaling,
title = {Scaling Neural Machine Translation},
author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
year = 2018,
}
```
|