File size: 3,508 Bytes
cc22397 9fe1302 cc22397 305782a f24dd4b 43f7668 3c1bb3f 1b99ca9 75ae637 f24dd4b 75ae637 f24dd4b 75ae637 f24dd4b 305782a 75ae637 305782a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
datasets:
- openslr/librispeech_asr
language:
- en
metrics:
- wer
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
widget:
- example_title: Librispeech sample 1
src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
pipeline_tag: automatic-speech-recognition
---
Internal model alias name:
`v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001`
Last epoch (subepoch 500) greedy decoding (without LM) on Librispeech (WERs):
`{"dev-clean": 2.38, "dev-other": 5.67, "test-clean": 2.63, "test-other": 5.93}`
(Note, together with a good LM trained on Librispeech LM text data,
`output/ctc_recog_ext/ctc+lm/opt-beam128-fp128-lm_n32-d1024-labelprior/recog-1stpass-res.txt`:
`{"dev-clean": 2.04, "dev-other": 4.06, "test-clean": 2.08, "test-other": 4.36}`)
From https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2024_04_23_baselines/ctc.py.
Usage example:
https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py
Example:
```shell
pip install torch
pip install returnn
wget https://raw.githubusercontent.com/rwth-i6/i6_experiments/refs/heads/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py
wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/data/epoch.500.pt
wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/deps/spm.vocab
python model_2024_ctc_spm10k.py example_audio.ogg
```
This Sisyphus config code snippet was used to setup the Sisyphus training job:
<details>
```python
# v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001
# noBias. (Baseline: 5.77)
train_exp( # 5.65 (!!!)
"v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2"
"-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001",
config_11gb_v6_f32_accgrad1_mgpu4_pavg100_wd1e_4,
model_config={
"enc_conformer_layer": rf.build_dict(
rf.encoder.conformer.ConformerEncoderLayer,
ff=rf.build_dict(
rf.encoder.conformer.ConformerPositionwiseFeedForward,
activation=rf.build_dict(rf.relu_square),
with_bias=False,
),
num_heads=8,
),
"feature_batch_norm": True,
},
config_updates={
**_get_cfg_lrlin_oclr_by_bs_nep(15_000, 500),
"optimizer.weight_decay": 1e-2,
"__train_audio_preprocess": speed_pert_librosa_config,
"speed_pert_discrete_values": [0.7, 0.8, 0.9, 1.0, 1.1],
"aux_attention_decoder": rf.build_dict(TransformerDecoder, num_layers=6), # purely used for training
},
vocab="spm10k",
train_vocab_opts={"other_opts": {"class": "SamplingBytePairEncoding", "breadth_prob": 0.01}},
)
```
</details>
I uploaded the `info` and `output` files from the Sisyphus RETURNN training job to `trainjob`,
except of the model checkpoint, which I uploaded to `data`.
From the train job `info` file, I was checking dependencies.
Specifically, there is the SPM vocab.
I uploaded those to `deps`.
|