Why do we need espeak-ng?
Sorry for the total noob question here, but what is the point of using Kokoro for TTS if it needs espeak-ng as a fallback if it can't handle something?
Yes, I understand it is optional, but apparently Kokoro will skip stuff at times if you don't have it. Which seems bad. Yes, I get it that is why they have the fallback just for situations like those.
But my question is more like "Okay, then why am I using Kokoro instead of espeak-ng itself?
Also, doesn't that remove much of the advantage of having such a lightweight model?
Link:
https://github.com/espeak-ng/espeak-ng?tab=readme-ov-file#documentation
espeak-ng is only used for G2P purposes. Kokoro is used for TTS. Feel free to listen to the espeak-ng TTS voices if you like.
Good explanation. Thanks.
Also, espeak-ng is used as an English fallback because the G2P dictionary files here https://hf.co/datasets/hexgrad/misaki/tree/main are only a couple hundred thousand words.
When you encounter an OOD word, espeak-ng is used for G2P. The dictionaries should in theory be more accurate than espeak-ng, but if you don't have espeak-ng installed, an OOD word will be skipped because there's no dictionary entry for it.