Text-to-Speech
English
hexgrad commited on
Commit
aab4d9f
·
verified ·
1 Parent(s): aa89b69

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -37,6 +37,10 @@ Voices are listed in [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/
37
 
38
  Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
39
 
 
 
 
 
40
  ### Usage
41
 
42
  The following can be run in a single cell on [Google Colab](https://colab.research.google.com/).
 
37
 
38
  Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
39
 
40
+ Most voices perform best on a "goldilocks range" of say, 50-150 tokens out of ~500 possible. Voices may perform worse at the extremes:
41
+ - **Weakness** on short utterances (especially less than 10-20 tokens). Root cause could be lack of short-utterance training data and/or model architecture. One possible inference mitigation is to bundle shorter utterances together.
42
+ - **Rushing** on long utterances (over 400 tokens). You can chunk down to shorter utterances or can adjust the `speed` parameter to mitigate this.
43
+
44
  ### Usage
45
 
46
  The following can be run in a single cell on [Google Colab](https://colab.research.google.com/).