File size: 5,093 Bytes
4912cae 09817e3 6785df0 09817e3 4912cae ba2d574 87b1055 ba2d574 67c1668 ba2d574 cd92916 385fafa ba2d574 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
library_name: transformers
tags:
- vietnamese
- multi_lingual
- audio2text
- sp
- speech_to_text
license: mit
language:
- vi
metrics:
- wer
- bleu
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---
<p align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/63d8d8879dfcfa941d4d7cd9/GsQKdaTyn2FFx_cZvVHk3.png" alt="Logo">
</p>
# EraX-WoW-Turbo: Whisper Large-v3 Turbo for Vietnamese and then some, Supercharged and Localized! 🚀
**(A promise fulfilled! MIT License - Absolutely, positively, totally free.)**
Get ready to experience speech recognition that's faster than a caffeinated cheetah and accurate enough to impress even your most skeptical tech-savvy friends. EraX-WoW-Turbo is here, built upon the already impressive Whisper Large-v3 Turbo, but with a special sauce that makes it truly shine. Think of it as Whisper Large-v3 after a rigorous training montage and a *lot* of espresso.
## What's the Big Deal?
* **Blazing Fast:** We're talking *real-time* transcription. Thanks to the clever optimizations in the Turbo architecture, this model chews through 30 seconds of audio in about 350ms. Forget about waiting; your transcripts will appear practically *before* you finish speaking. (The original Medium model? Bless its heart, it can't keep up.)
* **Multilingual Maestro:** EraX-WoW-Turbo isn't just fast; it's a linguistic polyglot. We've fine-tuned it on a diverse dataset covering 11 key languages:
* Vietnamese (with love from all 8 regions! We didn't forget any accents 😉)
* Hindi
* Chinese
* English
* Russian
* German
* Ukrainian
* Japanese
* French
* Dutch
* Korean
We believe this selection provides a strong foundation for a wide range of applications. (Our apologies to our Khmer-speaking and Thailand-speaking friends; we'll get you in the next version! Blame it on old age and forgetfulness. 👴👵)
* **Accuracy You Can Trust:** We're still finalizing the benchmark results (coming soon!), but preliminary tests show an impressive WER (Word Error Rate) around 12% across the major languages, including challenging Vietnamese dialects. This thing understands you, even if you've got a *really* strong regional accent.
* **Trained with Care:** The model was trained on a substantial dataset (300,000 samples, roughly 1000 hours), covering real-world audio conditions. Noise? No problem!
* **Open Source (MIT License):** Do whatever you want, no restrictions.
## Turbocharging Performance (CTranslate2)**
While EraX-WoW-Turbo is already lightning-fast, you can unlock *even more* speed by using it with the CTranslate2 library ([https://github.com/OpenNMT/CTranslate2](https://github.com/OpenNMT/CTranslate2)). We're talking about a potential 2.5x speedup! This makes it ideal for applications requiring the absolute lowest latency.
## Use Cases
* **Real-time Transcription:** Live captioning, meetings, interviews... anything where speed matters.
* **Voice Assistants:** Build responsive and accurate voice-controlled applications.
* **Media Subtitling:** Generate subtitles for videos and podcasts quickly and accurately.
* **Accessibility Tools:** Empower individuals with hearing impairments.
* **Language Learning:** Practice pronunciation and receive instant feedback.
* **Combine it with our upcoming EraX translator (around 100ms/sentence latency) for a complete multilingual communication powerhouse! Think instant translation for international conferences or even a travel app.**
## Limitations (Honesty is the Best Policy!)
* **Not for Babies (or Whispers):** This model is trained on adult speech. It *might* struggle with the high-pitched cries of infants or very quiet, hushed whispers. (We're working on it!) So use in the right cases.
## Get Involved!
We're passionate about making speech recognition accessible to everyone. We encourage you to:
* **Try it out!** Download the model and put it to the test.
* **Provide feedback:** Let us know what works, what doesn't, and what features you'd like to see. (Be gentle with the criticisms; we're sensitive! 😉)
* **Contribute:** If you're a developer, consider contributing to the project.
The EraX Team is committed to continuously improving our models. Stay tuned for future updates and even more exciting developments!The EraX Team.
## License:
- **MIT** follows Whisper's license.
## Citation 📝
<!-- title={EraX-WoW-Tuebo-V1.0: Lắng nghe để Yêu thương.},
author={Nguyễn Anh Nguyên},
organization={EraX},
year={2025},
url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.0}-->
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
```
@article{title={EraX-WoW-Turbo-V1.0: Lắng nghe để Yêu thương.},
author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
organization={EraX},
year={2025},
url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.0}
}
``` |