--- datasets: - openslr/librispeech_asr language: - en metrics: - wer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - example_title: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac pipeline_tag: automatic-speech-recognition --- Internal model alias name: `v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001` Last epoch (subepoch 500) greedy decoding (without LM) on Librispeech (WERs): `{"dev-clean": 2.38, "dev-other": 5.67, "test-clean": 2.63, "test-other": 5.93}` (Note, together with a good LM trained on Librispeech LM text data, `output/ctc_recog_ext/ctc+lm/opt-beam128-fp128-lm_n32-d1024-labelprior/recog-1stpass-res.txt`: `{"dev-clean": 2.04, "dev-other": 4.06, "test-clean": 2.08, "test-other": 4.36}`) From https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2024_04_23_baselines/ctc.py. Usage example: https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py Example: ```shell pip install torch pip install returnn wget https://raw.githubusercontent.com/rwth-i6/i6_experiments/refs/heads/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/data/epoch.500.pt wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/deps/spm.vocab python model_2024_ctc_spm10k.py example_audio.ogg ``` This Sisyphus config code snippet was used to setup the Sisyphus training job:
```python # v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001 # noBias. (Baseline: 5.77) train_exp( # 5.65 (!!!) "v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2" "-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001", config_11gb_v6_f32_accgrad1_mgpu4_pavg100_wd1e_4, model_config={ "enc_conformer_layer": rf.build_dict( rf.encoder.conformer.ConformerEncoderLayer, ff=rf.build_dict( rf.encoder.conformer.ConformerPositionwiseFeedForward, activation=rf.build_dict(rf.relu_square), with_bias=False, ), num_heads=8, ), "feature_batch_norm": True, }, config_updates={ **_get_cfg_lrlin_oclr_by_bs_nep(15_000, 500), "optimizer.weight_decay": 1e-2, "__train_audio_preprocess": speed_pert_librosa_config, "speed_pert_discrete_values": [0.7, 0.8, 0.9, 1.0, 1.1], "aux_attention_decoder": rf.build_dict(TransformerDecoder, num_layers=6), # purely used for training }, vocab="spm10k", train_vocab_opts={"other_opts": {"class": "SamplingBytePairEncoding", "breadth_prob": 0.01}}, ) ```
I uploaded the `info` and `output` files from the Sisyphus RETURNN training job to `trainjob`, except of the model checkpoint, which I uploaded to `data`. From the train job `info` file, I was checking dependencies. Specifically, there is the SPM vocab. I uploaded those to `deps`.