File size: 5,093 Bytes
4912cae
 
09817e3
 
 
 
6785df0
 
09817e3
 
 
 
 
 
 
 
 
4912cae
ba2d574
 
 
 
 
87b1055
ba2d574
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67c1668
ba2d574
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cd92916
385fafa
ba2d574
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
library_name: transformers
tags:
- vietnamese
- multi_lingual
- audio2text
- sp
- speech_to_text
license: mit
language:
- vi
metrics:
- wer
- bleu
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---

<p align="left">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63d8d8879dfcfa941d4d7cd9/GsQKdaTyn2FFx_cZvVHk3.png" alt="Logo">
</p>

# EraX-WoW-Turbo: Whisper Large-v3 Turbo for Vietnamese and then some, Supercharged and Localized! 🚀

**(A promise fulfilled! MIT License - Absolutely, positively, totally free.)**

Get ready to experience speech recognition that's faster than a caffeinated cheetah and accurate enough to impress even your most skeptical tech-savvy friends.  EraX-WoW-Turbo is here, built upon the already impressive Whisper Large-v3 Turbo, but with a special sauce that makes it truly shine.  Think of it as Whisper Large-v3 after a rigorous training montage and a *lot* of espresso.

## What's the Big Deal?

*   **Blazing Fast:** We're talking *real-time* transcription.  Thanks to the clever optimizations in the Turbo architecture, this model chews through 30 seconds of audio in about 350ms.  Forget about waiting; your transcripts will appear practically *before* you finish speaking.  (The original Medium model?  Bless its heart, it can't keep up.)
*   **Multilingual Maestro:** EraX-WoW-Turbo isn't just fast; it's a linguistic polyglot.  We've fine-tuned it on a diverse dataset covering 11 key languages:
    *   Vietnamese (with love from all 8 regions!  We didn't forget any accents 😉)
    *   Hindi
    *   Chinese
    *   English
    *   Russian
    *   German
    *   Ukrainian
    *   Japanese
    *   French
    *   Dutch
    *   Korean

    We believe this selection provides a strong foundation for a wide range of applications. (Our apologies to our Khmer-speaking and Thailand-speaking friends; we'll get you in the next version!  Blame it on old age and forgetfulness. 👴👵)

*   **Accuracy You Can Trust:**  We're still finalizing the benchmark results (coming soon!), but preliminary tests show an impressive WER (Word Error Rate) around 12% across the major languages, including challenging Vietnamese dialects.  This thing understands you, even if you've got a *really* strong regional accent.
*  **Trained with Care:** The model was trained on a substantial dataset (300,000 samples, roughly 1000 hours), covering real-world audio conditions. Noise? No problem!
* **Open Source (MIT License):** Do whatever you want, no restrictions.

## Turbocharging Performance (CTranslate2)**

While EraX-WoW-Turbo is already lightning-fast, you can unlock *even more* speed by using it with the CTranslate2 library ([https://github.com/OpenNMT/CTranslate2](https://github.com/OpenNMT/CTranslate2)).  We're talking about a potential 2.5x speedup!  This makes it ideal for applications requiring the absolute lowest latency.

## Use Cases

*   **Real-time Transcription:**  Live captioning, meetings, interviews... anything where speed matters.
*   **Voice Assistants:**  Build responsive and accurate voice-controlled applications.
*   **Media Subtitling:**  Generate subtitles for videos and podcasts quickly and accurately.
*   **Accessibility Tools:**  Empower individuals with hearing impairments.
*   **Language Learning:**  Practice pronunciation and receive instant feedback.
* **Combine it with our upcoming EraX translator (around 100ms/sentence latency) for a complete multilingual communication powerhouse!  Think instant translation for international conferences or even a travel app.**

## Limitations (Honesty is the Best Policy!)

*   **Not for Babies (or Whispers):** This model is trained on adult speech.  It *might* struggle with the high-pitched cries of infants or very quiet, hushed whispers.  (We're working on it!) So use in the right cases.

## Get Involved!

We're passionate about making speech recognition accessible to everyone.  We encourage you to:

*   **Try it out!**  Download the model and put it to the test.
*   **Provide feedback:**  Let us know what works, what doesn't, and what features you'd like to see. (Be gentle with the criticisms; we're sensitive! 😉)
*   **Contribute:**  If you're a developer, consider contributing to the project.

The EraX Team is committed to continuously improving our models.  Stay tuned for future updates and even more exciting developments!The EraX Team.

## License:
- **MIT** follows Whisper's license.

## Citation 📝
<!-- title={EraX-WoW-Tuebo-V1.0: Lắng nghe để Yêu thương.},
  author={Nguyễn Anh Nguyên},
  organization={EraX},
  year={2025},
  url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.0}-->
  
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
```
@article{title={EraX-WoW-Turbo-V1.0: Lắng nghe để Yêu thương.},
  author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
  organization={EraX},
  year={2025},
  url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.0}
}
```