Text-to-Speech
English
Kokoro-82M / VOICES.md
hexgrad's picture
Upload 2 files
aa89b69 verified
|
raw
history blame
2.52 kB

Voices

For each voice, the given grades are intended to be estimates of the quality and quantity of its associated training data, both of which impact overall inference quality.

Subjectively, voices will sound better or worse to different people.

Target Quality

  • How high quality is the reference voice? This grade may be impacted by audio quality, artifacts, compression, & sample rate.
  • How well do the text labels match the audio? Text/audio misalignment (e.g. from hallucinations) will lower this grade.

Training Duration

  • How much audio was seen during training? Smaller durations result in a lower overall grade.

American πŸ‡ΊπŸ‡Έ

American G2P: misaki[en] with en-us espeak-ng fallback

Name Traits Target Quality Training Duration Overall Grade
af_alloy 🚺 B MM minutes C
af_aoede 🚺 B H hours C+
af_bella 🚺πŸ”₯ A HH hours A-
af_jessica 🚺 C MM minutes D
af_kore 🚺 B H hours C+
af_nicole 🚺🎧 B HH hours B-
af_nova 🚺 B MM minutes C
af_river 🚺 C MM minutes D
af_sarah 🚺 B H hours C+
af_sky 🚺 B M minutes C-
am_adam 🚹 D H hours F+
am_echo 🚹 C MM minutes D
am_eric 🚹 C MM minutes D
am_fenrir 🚹 B H hours C+
am_liam 🚹 C MM minutes D
am_michael 🚹 B H hours C+
am_onyx 🚹 C MM minutes D
am_puck 🚹 B H hours C+

British πŸ‡¬πŸ‡§

British G2P: misaki[en] with en-gb espeak-ng fallback

Name Traits Target Quality Training Duration Overall Grade
bf_alice 🚺 C MM minutes D
bf_emma 🚺 B HH hours B-
bf_isabella 🚺 B MM minutes C
bf_lily 🚺 C MM minutes D
bm_daniel 🚹 C MM minutes D
bm_fable 🚹 B MM minutes C
bm_george 🚹 B MM minutes C
bm_lewis 🚹 C H hours D+

French πŸ‡«πŸ‡·

French G2P: espeak-ng fr-fr

Name Traits Target Quality Training Duration Overall Grade
ff_siwis 🚺 B <11 hours B-