sasha-meister commited on
Commit
1477d40
·
1 Parent(s): 4e152c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -2
README.md CHANGED
@@ -158,7 +158,8 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
158
 
159
  The vocabulary we use contains 33 characters:
160
  ```python
161
- [' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
 
162
 
163
  Rare symbols with diacritics were replaced during preprocessing.
164
 
@@ -170,4 +171,36 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
170
  - Mozilla Common Voice 10.0 (Russian) - train subset [28 hours]
171
  - Golos - crowd [1070 hours] and fairfield [111 hours] subsets
172
  - Russian LibriSpeech (RuLS) [92 hours]
173
- - SOVA - RuAudiobooksDevices [260 hours] and RuDevices [75 hours] subsets
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
 
159
  The vocabulary we use contains 33 characters:
160
  ```python
161
+ [' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']
162
+ ```
163
 
164
  Rare symbols with diacritics were replaced during preprocessing.
165
 
 
171
  - Mozilla Common Voice 10.0 (Russian) - train subset [28 hours]
172
  - Golos - crowd [1070 hours] and fairfield [111 hours] subsets
173
  - Russian LibriSpeech (RuLS) [92 hours]
174
+ - SOVA - RuAudiobooksDevices [260 hours] and RuDevices [75 hours] subsets
175
+
176
+ ## Performance
177
+
178
+ The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
179
+
180
+ | Version | Tokenizer | Vocabulary Size | MCV 10.0 dev | MCV 10.0 test | GOLOS-crowd test | GOLOS-farfield test | RuLS test | Train Dataset |
181
+ |---------|-----------------------|-----------------|--------------|---------------|------------------|---------------------|-----------|---------------|
182
+ | 1.13.0 | SentencePiece Unigram | 1024 | 3.5 | 4.0 | 2.7 | 7.6 | 12.0 | NeMo ASRSET |
183
+
184
+ ## Limitations
185
+
186
+ Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
187
+
188
+ ## Deployment with NVIDIA Riva
189
+
190
+ [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
191
+ Additionally, Riva provides:
192
+
193
+ * World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
194
+ * Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization
195
+ * Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support
196
+
197
+ Although this model isn’t supported yet by Riva, the [list of supported models is here](https://huggingface.co/models?other=Riva).
198
+ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
199
+
200
+ ## References
201
+
202
+ - [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
203
+
204
+ - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
205
+
206
+ - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)