# **Baseline Simultaneous Translation** --- This is an instruction of training and evaluating a *wait-k* simultanoes LSTM model on MUST-C English-Gernam Dataset. [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://https://www.aclweb.org/anthology/P19-1289/) ## **Requirements** Install fairseq (make sure to use the correct branch): ``` git clone --branch simulastsharedtask git@github.com:pytorch/fairseq.git cd fairseq pip install -e . ``` Assuming that fairseq is installed in a directory called `FAIRSEQ`. Install SentencePiece. One easy way is to use anaconda: ``` conda install -c powerai sentencepiece ``` Download the MuST-C data for English-German available at https://ict.fbk.eu/must-c/. We will assume that the data is downloaded in a directory called `DATA_ROOT`. ## **Text-to-text Model** --- ### Data Preparation Train a SentencePiece model: ```shell for lang in en de; do python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \ --data-path $DATA_ROOT/data \ --vocab-size 10000 \ --max-frame 3000 \ --model-type unigram \ --lang $lang \ --out-path . ``` Process the data with the SentencePiece model: ```shell proc_dir=proc mkdir -p $proc_dir for split in train dev tst-COMMON tst-HE; do for lang in en de; do spm_encode \ --model unigram-$lang-10000-3000/spm.model \ < $DATA_ROOT/data/$split/txt/$split.$lang \ > $proc_dir/$split.spm.$lang done done ``` Binarize the data: ```shell proc_dir=proc fairseq-preprocess \ --source-lang en --target-lang de \ --trainpref $proc_dir/train.spm \ --validpref $proc_dir/dev.spm \ --testpref $proc_dir/tst-COMMON.spm \ --thresholdtgt 0 \ --thresholdsrc 0 \ --workers 20 \ --destdir ./data-bin/mustc_en_de \ ``` ### Training ```shell mkdir -p checkpoints CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \ --save-dir checkpoints \ --arch berard_simul_text_iwslt \ --simul-type waitk \ --waitk-lagging 2 \ --optimizer adam \ --max-epoch 100 \ --lr 0.001 \ --clip-norm 5.0 \ --batch-size 128 \ --log-format json \ --log-interval 10 \ --criterion cross_entropy_acc \ --user-dir $FAIRSEQ/examples/simultaneous_translation ``` ## **Speech-to-text Model** --- ### Data Preparation First, segment wav files. ```shell python $FAIRSEQ/examples/simultaneous_translation/data/segment_wav.py \ --datapath $DATA_ROOT ``` Similar to text-to-text model, train a Sentencepiecemodel, but only train on German ```Shell python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \ --data-path $DATA_ROOT/data \ --vocab-size 10000 \ --max-frame 3000 \ --model-type unigram \ --lang $lang \ --out-path . ``` ## Training ```shell mkdir -p checkpoints CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \ --save-dir checkpoints \ --arch berard_simul_text_iwslt \ --waitk-lagging 2 \ --waitk-stride 10 \ --input-feat-per-channel 40 \ --encoder-hidden-size 512 \ --output-layer-dim 128 \ --decoder-num-layers 3 \ --task speech_translation \ --user-dir $FAIRSEQ/examples/simultaneous_translation --optimizer adam \ --max-epoch 100 \ --lr 0.001 \ --clip-norm 5.0 \ --batch-size 128 \ --log-format json \ --log-interval 10 \ --criterion cross_entropy_acc \ --user-dir $FAIRSEQ/examples/simultaneous_translation ``` ## Evaluation --- ### Evaluation Server For text translation models, the server is set up as follow give input file and reference file. ``` shell python ./eval/server.py \ --hostname localhost \ --port 12321 \ --src-file $DATA_ROOT/data/dev/txt/dev.en \ --ref-file $DATA_ROOT/data/dev/txt/dev.de ``` For speech translation models, the input is the data direcrory. ``` shell python ./eval/server.py \ --hostname localhost \ --port 12321 \ --ref-file $DATA_ROOT \ --data-type speech ``` ### Decode and Evaluate with Client Once the server is set up, run client to evaluate translation quality and latency. ```shell # TEXT python $fairseq_dir/examples/simultaneous_translation/evaluate.py \ data-bin/mustc_en_de \ --user-dir $FAIRSEQ/examples/simultaneous_translation \ --src-spm unigram-en-10000-3000/spm.model\ --tgt-spm unigram-de-10000-3000/spm.model\ -s en -t de \ --path checkpoints/checkpoint_best.pt # SPEECH python $fairseq_dir/examples/simultaneous_translation/evaluate.py \ data-bin/mustc_en_de \ --user-dir $FAIRSEQ/examples/simultaneous_translation \ --data-type speech \ --tgt-spm unigram-de-10000-3000/spm.model\ -s en -t de \ --path checkpoints/checkpoint_best.pt ```