nvidia
/

stt_ru_conformer_transducer_large

@@ -158,7 +158,8 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
 The vocabulary we use contains 33 characters:
 ```python
-[' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
 Rare symbols with diacritics were replaced during preprocessing.
@@ -170,4 +171,36 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
 - Mozilla Common Voice 10.0 (Russian) - train subset [28 hours]
 - Golos - crowd [1070 hours] and fairfield [111 hours] subsets
 - Russian LibriSpeech (RuLS) [92 hours]
-- SOVA - RuAudiobooksDevices [260 hours] and RuDevices [75 hours] subsets

 The vocabulary we use contains 33 characters:
 ```python
+[' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']
+```
 Rare symbols with diacritics were replaced during preprocessing.
 - Mozilla Common Voice 10.0 (Russian) - train subset [28 hours]
 - Golos - crowd [1070 hours] and fairfield [111 hours] subsets
 - Russian LibriSpeech (RuLS) [92 hours]
+- SOVA - RuAudiobooksDevices [260 hours] and RuDevices [75 hours] subsets
+## Performance
+The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
+| Version | Tokenizer             | Vocabulary Size | MCV 10.0 dev | MCV 10.0 test | GOLOS-crowd test | GOLOS-farfield test | RuLS test | Train Dataset |
+|---------|-----------------------|-----------------|--------------|---------------|------------------|---------------------|-----------|---------------|
+| 1.13.0  | SentencePiece Unigram | 1024            | 3.5          | 4.0           | 2.7              | 7.6                 | 12.0      | NeMo ASRSET   |
+## Limitations
+Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
+## Deployment with NVIDIA Riva
+[NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
+Additionally, Riva provides:
+* World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
+* Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization
+* Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support
+Although this model isn’t supported yet by Riva, the [list of supported models is here](https://huggingface.co/models?other=Riva).
+Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
+## References
+- [1] [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
+- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)