Text-to-Speech
English

v0.19 and v1.0 versions Comparison: Advancements and Limitations

#121
by MaverickBuffoon - opened

The overall voice and pronunciation of v1.0 version are more natural and smooth than v0.19, but in some cases, it corrects pronunciations too much.
Version 0.19 strictly adheres to the pronunciation definitions in the Espeak-NG files, while version 1.0 makes adjustments to improve pronunciation.

v0.19:

v1.0:

"The persona reflects the social facade that a person presents to the outside world."
Although Espeak-NG explicitly codes 'present $verb,' version 0.19 cannot recognize 'presents' as a verb unless it is followed by '$verbf,' as in 'we,' 'she,' or 'it.' In contrast, version 1.0 can detect it correctly.

The following are examples that adjust pronunciation too much and do not conform to the en_list and en_extra files of Espeak-NG:

v0.19:

v1.0:

"duties that match those levied on goods"
Version 0.19 uses l'EvId (IPA: lˈɛvɪd) as per Espeak-NG, but version 1.0 messes up the pronunciation.

v0.19:

v1.0:

"a person of unknown lineage"
Lineage (IPA: lˈɪnɪᵻdʒ) is coded in the en_list file as 'lineage lInI;I2dZ,' but version 1.0 does not follow the phoneme translation.

I hope there will be an additional voice option that strictly follows Espeak-NG while also having a tone similar to the af_heart voice when needed. I have some custom pronunciations in the en_extra file, like 'chopin S'oUpan $capital,' but version 1.0 doesn’t adhere to those pronunciations (adjust it inappropriately), while version 0.19 does.

Thanks, I patched levied and lineage in https://github.com/hexgrad/misaki/pull/49 and pushed it to 0.7.15. If you have more feedback please let me know, these are not hard to fix @MaverickBuffoon

Thank you from the bottom of my heart

@hexgrad when will be the encoder of kokoro released?

I’ve noticed that when any words in the JSON have a "-", they get ignored entirely. Even changing the IPA does not change the pronunciation.. For instance, “one-on-one” ends up being pronounced as three separate words, which makes it sound choppy.
v0.19

v1.0

"There are 1,225 one-on-one relationships"
Even if I change "one-dayer": "wˌʌndˈAəɹ" to "one-dayer": "wˌʌndˈAəɹʌnʌnʌn," I still can't hear any change in pronunciation. However, "onedayer": "wˌʌndˈAəɹʌnʌnʌn" works just fine.

Sign up or log in to comment