cecilemacaire commited on
Commit
cbfa777
·
verified ·
1 Parent(s): 44b2307

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -62
README.md CHANGED
@@ -24,9 +24,9 @@ tags:
24
  *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
25
 
26
  The fine-tuned model achieves the following performance :
27
- | Release | Valid WER | Test WER | GPUs |
28
- |:-------------:|:--------------:|:--------------:| :--------:|
29
- | 2023-09-08 | 9.14 | 11.21 | 4xV100 32GB |
30
 
31
  ## Model Details
32
 
@@ -44,8 +44,35 @@ PROPICTO ANR-20-CE93-0005
44
  - **License:** Apache-2.0
45
  - **Finetuned from model:** LeBenchmark/wav2vec2-FR-7K-large
46
 
 
47
 
48
- ## How to Get Started with the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Training Details
51
 
@@ -64,69 +91,25 @@ The `common_voice_prepare.py` script handles the preprocessing of the dataset.
64
 
65
  #### Training Hyperparameters
66
 
 
67
 
 
68
 
69
- #### Speeds, Sizes, Times [optional]
70
-
71
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
72
-
73
- [More Information Needed]
74
-
75
- ## Evaluation
76
-
77
- <!-- This section describes the evaluation protocols and provides the results. -->
78
-
79
- ### Testing Data, Factors & Metrics
80
-
81
- #### Testing Data
82
-
83
- <!-- This should link to a Dataset Card if possible. -->
84
-
85
- [More Information Needed]
86
-
87
- #### Metrics
88
-
89
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
90
-
91
- [More Information Needed]
92
-
93
- ### Results
94
-
95
- [More Information Needed]
96
-
97
- #### Summary
98
-
99
- [More Information Needed]
100
-
101
- ## Environmental Impact
102
-
103
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
104
-
105
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
106
-
107
- - **Hardware Type:** [More Information Needed]
108
- - **Hours used:** [More Information Needed]
109
- - **Cloud Provider:** [More Information Needed]
110
- - **Compute Region:** [More Information Needed]
111
- - **Carbon Emitted:** [More Information Needed]
112
-
113
- ## Technical Specifications [optional]
114
-
115
- ### Model Architecture and Objective
116
-
117
- [More Information Needed]
118
-
119
- ### Compute Infrastructure
120
-
121
- [More Information Needed]
122
-
123
- #### Hardware
124
-
125
- [More Information Needed]
126
 
127
  #### Software
128
 
129
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
130
 
131
  ## Citation
132
 
 
24
  *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
25
 
26
  The fine-tuned model achieves the following performance :
27
+ | Release | Valid WER | Test WER | GPUs | Epochs
28
+ |:-------------:|:--------------:|:--------------:| :--------:|:--------:|
29
+ | 2023-09-08 | 9.14 | 11.21 | 4xV100 32GB | 30 |
30
 
31
  ## Model Details
32
 
 
44
  - **License:** Apache-2.0
45
  - **Finetuned from model:** LeBenchmark/wav2vec2-FR-7K-large
46
 
47
+ ## How to transcribe a file with the model
48
 
49
+ ### Install and import speechbrain
50
+
51
+ ```bash
52
+ pip install speechbrain
53
+ ```
54
+
55
+ ```python
56
+ from speechbrain.inference.ASR import EncoderASR
57
+ ```
58
+
59
+ ### Pipeline
60
+
61
+ ```python
62
+ def transcribe(audio, model):
63
+ return model.transcribe_file(audio).lower()
64
+
65
+
66
+ def save_transcript(transcript, audio, output_file):
67
+ with open(output_file, 'w', encoding='utf-8') as file:
68
+ file.write(f"{audio}\t{transcript}\n")
69
+
70
+
71
+ def main():
72
+ model = EncoderASR.from_hparams(model_wav2vec2, savedir="tmp/")
73
+ transcript = transcribe(audio, model)
74
+ save_transcript(transcript, audio, "out.txt")
75
+ ```
76
 
77
  ## Training Details
78
 
 
91
 
92
  #### Training Hyperparameters
93
 
94
+ Refer to the hyperparams.yaml file to get the hyperparameters information.
95
 
96
+ #### Training time
97
 
98
+ With 4xV100 32GB, the training took ~ 81 hours.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
  #### Software
101
 
102
+ (Speechbrain)[https://speechbrain.github.io/]:
103
+ ```bibtex
104
+ @misc{SB2021,
105
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
106
+ title = {SpeechBrain},
107
+ year = {2021},
108
+ publisher = {GitHub},
109
+ journal = {GitHub repository},
110
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
111
+ }
112
+ ```
113
 
114
  ## Citation
115