Text-to-Speech
English

Why do we need espeak-ng?

#126
by bnielson - opened

Sorry for the total noob question here, but what is the point of using Kokoro for TTS if it needs espeak-ng as a fallback if it can't handle something?

Yes, I understand it is optional, but apparently Kokoro will skip stuff at times if you don't have it. Which seems bad. Yes, I get it that is why they have the fallback just for situations like those.

But my question is more like "Okay, then why am I using Kokoro instead of espeak-ng itself?

Also, doesn't that remove much of the advantage of having such a lightweight model?

Link:
https://github.com/espeak-ng/espeak-ng?tab=readme-ov-file#documentation

espeak-ng is only used for G2P purposes. Kokoro is used for TTS. Feel free to listen to the espeak-ng TTS voices if you like.

hexgrad changed discussion status to closed

Good explanation. Thanks.

Also, espeak-ng is used as an English fallback because the G2P dictionary files here https://hf.co/datasets/hexgrad/misaki/tree/main are only a couple hundred thousand words.

When you encounter an OOD word, espeak-ng is used for G2P. The dictionaries should in theory be more accurate than espeak-ng, but if you don't have espeak-ng installed, an OOD word will be skipped because there's no dictionary entry for it.

Sign up or log in to comment