luomingshuang
/

icefall_asr_wenetspeech_pruned_transducer_stateless2

ONNX

Model card Files Files and versions Community

luomingshuang commited on May 19, 2022

Commit

7a6760a

1 Parent(s): ffa2a91

change for README.md

Browse files

Files changed (1) hide show

README.md +18 -15

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
-Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
-And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.
-# Pre-trained Transducer-Stateless2 models for the Aidatatang_200zh dataset with icefall.
-The model was trained on full [Aidatatang_200zh](https://www.openslr.org/62) with the scripts in [icefall](https://github.com/k2-fsa/icefall) based on the latest version k2.
 ## Training procedure
 The main repositories are list below, we will update the training and decoding scripts with the update of version.
 k2: https://github.com/k2-fsa/k2
@@ -15,25 +14,29 @@ cd icefall
 ```
 * Preparing data.
 ```
-cd egs/aidatatang_200zh/ASR
 bash ./prepare.sh
 ```
 * Training
 ```
-export CUDA_VISIBLE_DEVICES="0,1"
 ./pruned_transducer_stateless2/train.py \
-                  --world-size 2 \
-                  --num-epochs 30 \
                   --start-epoch 0 \
                   --exp-dir pruned_transducer_stateless2/exp \
                   --lang-dir data/lang_char \
-                  --max-duration 250
 ```
 ## Evaluation results
-The decoding results (WER%) on Aidatatang_200zh(dev and test) are listed below, we got this result by averaging models from epoch 11 to 29.
 The WERs are
-|                                    |     dev    |    test    | comment                                  |
-|------------------------------------|------------|------------|------------------------------------------|
-|          greedy search             | 5.53       | 6.59       | --epoch 29, --avg 19, --max-duration 100 |
-| modified beam search (beam size 4) | 5.28       | 6.32       | --epoch 29, --avg 19, --max-duration 100 |
-| fast beam search (set as default)  | 5.29       | 6.33       | --epoch 29, --avg 19, --max-duration 1500|

+Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/349
+# Pre-trained Transducer-Stateless2 models for the WenetSpeech dataset with icefall.
+The model was trained on the L subset of WenetSpeech with the scripts in [icefall](https://github.com/k2-fsa/icefall) based on the latest version k2.
 ## Training procedure
 The main repositories are list below, we will update the training and decoding scripts with the update of version.
 k2: https://github.com/k2-fsa/k2
 ```
 * Preparing data.
 ```
+cd egs/wenetspeech/ASR
 bash ./prepare.sh
 ```
 * Training
 ```
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 ./pruned_transducer_stateless2/train.py \
+                  --world-size 8 \
+                  --num-epochs 15 \
                   --start-epoch 0 \
                   --exp-dir pruned_transducer_stateless2/exp \
                   --lang-dir data/lang_char \
+                  --max-duration 180 \
+                  --valid-interval 3000 \
+                  --model-warm-step 3000 \
+                  --save-every-n 8000 \
+                  --training-subset L
 ```
 ## Evaluation results
+The decoding results (WER%) on WenetSpeech(dev, test-net and test-meeting) are listed below, we got this result by averaging models from epoch 9 to 10.
 The WERs are
+|                                    |  dev  | test-net | test-meeting | comment                                  |
+|------------------------------------|-------|----------|--------------|------------------------------------------|
+|          greedy search             | 7.80  | 8.75     | 13.49        | --epoch 10, --avg 2, --max-duration 100  |
+| modified beam search (beam size 4) | 7.76  | 8.71     | 13.41        | --epoch 10, --avg 2, --max-duration 100  |
+| fast beam search (set as default)  | 7.94  | 8.74     | 13.80        | --epoch 10, --avg 2, --max-duration 1500 |