library_name: transformers
tags:
- vietnamese
- multi_lingual
- audio2text
- sp
- speech_to_text
license: mit
language:
- vi
metrics:
- wer
- bleu
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
EraX-WoW-Turbo: Whisper Large-v3 Turbo for Vietnamese and then some, Supercharged and Localized! 🚀
(A promise fulfilled! MIT License - Absolutely, positively, totally free.)
Get ready to experience speech recognition that's faster than a caffeinated cheetah and accurate enough to impress even your most skeptical tech-savvy friends. EraX-WoW-Turbo is here, built upon the already impressive Whisper Large-v3 Turbo, but with a special sauce that makes it truly shine. Think of it as Whisper Large-v3 after a rigorous training montage and a lot of espresso.
What's the Big Deal?
Blazing Fast: We're talking real-time transcription. Thanks to the clever optimizations in the Turbo architecture, this model chews through 30 seconds of audio in about 350ms. Forget about waiting; your transcripts will appear practically before you finish speaking. (The original Medium model? Bless its heart, it can't keep up.)
Multilingual Maestro: EraX-WoW-Turbo isn't just fast; it's a linguistic polyglot. We've fine-tuned it on a diverse dataset covering 11 key languages:
- Vietnamese (with love from all 8 regions! We didn't forget any accents 😉)
- Hindi
- Chinese
- English
- Russian
- German
- Ukrainian
- Japanese
- French
- Dutch
- Korean
We believe this selection provides a strong foundation for a wide range of applications. (Our apologies to our Khmer-speaking and Thailand-speaking friends; we'll get you in the next version! Blame it on old age and forgetfulness. 👴👵)
Accuracy You Can Trust: We're still finalizing the benchmark results (coming soon!), but preliminary tests show an impressive WER (Word Error Rate) around 12% across the major languages, including challenging Vietnamese dialects. This thing understands you, even if you've got a really strong regional accent.
Trained with Care: The model was trained on a substantial dataset (300,000 samples, roughly 1000 hours), covering real-world audio conditions. Noise? No problem!
Open Source (MIT License): Do whatever you want, no restrictions.
Turbocharging Performance (CTranslate2)**
While EraX-WoW-Turbo is already lightning-fast, you can unlock even more speed by using it with the CTranslate2 library (https://github.com/OpenNMT/CTranslate2). We're talking about a potential 2.5x speedup! This makes it ideal for applications requiring the absolute lowest latency.
Use Cases
- Real-time Transcription: Live captioning, meetings, interviews... anything where speed matters.
- Voice Assistants: Build responsive and accurate voice-controlled applications.
- Media Subtitling: Generate subtitles for videos and podcasts quickly and accurately.
- Accessibility Tools: Empower individuals with hearing impairments.
- Language Learning: Practice pronunciation and receive instant feedback.
- Combine it with our upcoming EraX translator (around 100ms/sentence latency) for a complete multilingual communication powerhouse! Think instant translation for international conferences or even a travel app.
Limitations (Honesty is the Best Policy!)
- Not for Babies (or Whispers): This model is trained on adult speech. It might struggle with the high-pitched cries of infants or very quiet, hushed whispers. (We're working on it!) So use in the right cases.
Get Involved!
We're passionate about making speech recognition accessible to everyone. We encourage you to:
- Try it out! Download the model and put it to the test.
- Provide feedback: Let us know what works, what doesn't, and what features you'd like to see. (Be gentle with the criticisms; we're sensitive! 😉)
- Contribute: If you're a developer, consider contributing to the project.
The EraX Team is committed to continuously improving our models. Stay tuned for future updates and even more exciting developments!The EraX Team.
License:
- MIT follows Whisper's license.
Citation 📝
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
@article{title={EraX-WoW-Turbo-V1.0: Lắng nghe để Yêu thương.},
author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
organization={EraX},
year={2025},
url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.0}
}