GaelleLaperriere commited on
Commit
766c66c
·
1 Parent(s): bae33bd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ thumbnail: null
5
+ pipeline_tag: automatic-speech-recognition
6
+ tags:
7
+ - CTC
8
+ - pytorch
9
+ - speechbrain
10
+ - hf-asr-leaderboard
11
+ license: apache-2.0
12
+ datasets:
13
+ - MEDIA
14
+ metrics:
15
+ - cer
16
+ model-index:
17
+ - name: asr-wav2vec2-ctc-MEDIA
18
+ results:
19
+ - task:
20
+ name: Automatic Speech Recognition
21
+ type: automatic-speech-recognition
22
+ dataset:
23
+ name: MEDIA
24
+ type: MEDIA_asr
25
+ config: fr
26
+ split: test
27
+ args:
28
+ language: fr
29
+ metrics:
30
+ - name: Test CER
31
+ type: wer
32
+ value: 4.78
33
+ ---
34
+
35
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
36
+ <br/><br/>
37
+
38
+ # wav2vec 2.0 with CTC trained on MEDIA
39
+
40
+ This repository provides all the necessary tools to perform automatic speech
41
+ recognition from an end-to-end system pretrained on MEDIA (French Language) within
42
+ SpeechBrain. For a better experience, we encourage you to learn more about
43
+ [SpeechBrain](https://speechbrain.github.io).
44
+
45
+ The performance of the model is the following:
46
+
47
+ | Release | Test CER | GPUs |
48
+ |:-------------:|:--------------:|:--------:|
49
+ | 22-02-23 | 4.78 | 1xV100 32GB |
50
+
51
+ ## Pipeline description
52
+
53
+ This ASR system is composed of an acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([LeBenchmark/wav2vec2-FR-3K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-3K-large)) is combined with three DNN layers and finetuned on MEDIA.
54
+ The obtained final acoustic representation is given to the CTC greedy decoder.
55
+
56
+ The system is trained with recordings sampled at 16kHz (single channel).
57
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
58
+
59
+ ## Install SpeechBrain
60
+
61
+ First of all, please install tranformers and SpeechBrain with the following command:
62
+
63
+ ```
64
+ pip install speechbrain transformers
65
+ ```
66
+
67
+ Please notice that we encourage you to read our tutorials and learn more about
68
+ [SpeechBrain](https://speechbrain.github.io).
69
+
70
+ ### Transcribing your own audio files (in French)
71
+
72
+ ```python
73
+ from speechbrain.pretrained import EncoderASR
74
+
75
+ asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-ctc-MEDIA", savedir="pretrained_models/asr-wav2vec2-ctc-MEDIA")
76
+ asr_model.transcribe_file('speechbrain/asr-wav2vec2-ctc-MEDIA/example-fr.wav')
77
+
78
+ ```
79
+ ### Inference on GPU
80
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
81
+
82
+ ### Training
83
+ The model was trained with SpeechBrain.
84
+ To train it from scratch follow these steps:
85
+ 1. Clone SpeechBrain:
86
+ ```bash
87
+ git clone https://github.com/speechbrain/speechbrain/
88
+ ```
89
+ 2. Install it:
90
+ ```bash
91
+ cd speechbrain
92
+ pip install -r requirements.txt
93
+ pip install -e .
94
+ ```
95
+ 3. Download MEDIA related files:
96
+ - [Media ASR (ELRA-S0272)](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/)
97
+ - [Media SLU (ELRA-E0024)](https://catalogue.elra.info/en-us/repository/browse/ELRA-E0024/)
98
+ - [channels.csv and concepts_full_relax.csv](https://drive.google.com/drive/u/1/folders/1z2zFZp3c0NYLFaUhhghhBakGcFdXVRyf)
99
+ 5. Modify placeholders in hparams/train_hf_wav2vec.yaml:
100
+ ```bash
101
+ data_folder = !PLACEHOLDER
102
+ channels_path = !PLACEHOLDER
103
+ concepts_path = !PLACEHOLDER
104
+ ```
105
+ 5. Run Training:
106
+ ```bash
107
+ cd recipes/MEDIA/ASR/CTC/
108
+ python train_hf_wav2vec.py hparams/train_hf_wav2vec.yaml
109
+ ```
110
+
111
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1qJUKxsTKrYwzKz0LHzq67M4G06Mj-9fl?usp=sharing).
112
+
113
+ ### Limitations
114
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
115
+
116
+ #### Referencing SpeechBrain
117
+
118
+ ```
119
+ @misc{SB2021,
120
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
121
+ title = {SpeechBrain},
122
+ year = {2021},
123
+ publisher = {GitHub},
124
+ journal = {GitHub repository},
125
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
126
+ }
127
+ ```
128
+
129
+ #### About SpeechBrain
130
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
131
+
132
+ Website: https://speechbrain.github.io/
133
+
134
+ GitHub: https://github.com/speechbrain/speechbrain