File size: 7,625 Bytes
c6cb309 c2ad57f 32a55f8 5680cdc 32a55f8 5680cdc 8d75ee2 c6cb309 aa89b69 c6cb309 8d75ee2 c6cb309 a4fb0ea 938257c 5680cdc c6cb309 8d75ee2 c6cb309 32a55f8 aa89b69 b118d26 c2ad57f b118d26 5680cdc b118d26 5680cdc c6cb309 8d75ee2 c6cb309 32a55f8 aa89b69 b118d26 aa89b69 5680cdc 8d75ee2 aa89b69 32a55f8 aa89b69 b118d26 446dd3f 8d75ee2 446dd3f 32a55f8 446dd3f b118d26 446dd3f 8d75ee2 09f3623 32a55f8 09f3623 5680cdc c03b18b 5680cdc c03b18b 5680cdc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# Voices
- ๐บ๐ธ [American English](#american-english): 11F 9M
- ๐ฌ๐ง [British English](#british-english): 4F 4M
- ๐ฏ๐ต [Japanese](#japanese): 4F 1M
- ๐จ๐ณ [Mandarin Chinese](#mandarin-chinese): 4F 4M
- ๐ช๐ธ [Spanish](#spanish): 1F 2M
- ๐ซ๐ท [French](#french): 1F
- ๐ฎ๐ณ [Hindi](#hindi): 2F 2M
- ๐ฎ๐น [Italian](#italian): 1F 1M
- ๐ง๐ท [Brazilian Portuguese](#brazilian-portuguese): 1F 2M
For each voice, the given grades are intended to be estimates of the **quality and quantity** of its associated training data, both of which impact overall inference quality.
Subjectively, voices will sound better or worse to different people.
Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
Most voices perform best on a "goldilocks range" of 100-200 tokens out of ~500 possible. Voices may perform worse at the extremes:
- **Weakness** on short utterances, especially less than 10-20 tokens. Root cause could be lack of short-utterance training data and/or model architecture. One possible inference mitigation is to bundle shorter utterances together.
- **Rushing** on long utterances, especially over 400 tokens. You can chunk down to shorter utterances or adjust the `speed` parameter to mitigate this.
**Target Quality**
- How high quality is the reference voice? This grade may be impacted by audio quality, artifacts, compression, & sample rate.
- How well do the text labels match the audio? Text/audio misalignment (e.g. from hallucinations) will lower this grade.
**Training Duration**
- How much audio was seen during training? Smaller durations result in a lower overall grade.
- 10 hours <= **HH hours** < 100 hours
- 1 hour <= H hours < 10 hours
- 10 minutes <= MM minutes < 100 minutes
- 1 minute <= _M minutes_ ๐ค < 10 minutes
### American English
- `lang_code='a'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `en-us` fallback
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| **af\_heart** | ๐บโค๏ธ | | | **A** | `0ab5709b` |
| af_alloy | ๐บ | B | MM minutes | C | `6d877149` |
| af_aoede | ๐บ | B | H hours | C+ | `c03bd1a4` |
| af_bella | ๐บ๐ฅ | **A** | **HH hours** | **A-** | `8cb64e02` |
| af_jessica | ๐บ | C | MM minutes | D | `cdfdccb8` |
| af_kore | ๐บ | B | H hours | C+ | `8bfbc512` |
| af_nicole | ๐บ๐ง | B | **HH hours** | B- | `c5561808` |
| af_nova | ๐บ | B | MM minutes | C | `e0233676` |
| af_river | ๐บ | C | MM minutes | D | `e149459b` |
| af_sarah | ๐บ | B | H hours | C+ | `49bd364e` |
| af_sky | ๐บ | B | _M minutes_ ๐ค | C- | `c799548a` |
| am_adam | ๐น | D | H hours | F+ | `ced7e284` |
| am_echo | ๐น | C | MM minutes | D | `8bcfdc85` |
| am_eric | ๐น | C | MM minutes | D | `ada66f0e` |
| am_fenrir | ๐น | B | H hours | C+ | `98e507ec` |
| am_liam | ๐น | C | MM minutes | D | `c8255075` |
| am_michael | ๐น | B | H hours | C+ | `9a443b79` |
| am_onyx | ๐น | C | MM minutes | D | `e8452be1` |
| am_puck | ๐น | B | H hours | C+ | `dd1d8973` |
| am_santa | ๐น | C | _M minutes_ ๐ค | D- | `7f2f7582` |
### British English
- `lang_code='b'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `en-gb` fallback
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| bf_alice | ๐บ | C | MM minutes | D | `d292651b` |
| bf_emma | ๐บ | B | **HH hours** | B- | `d0a423de` |
| bf_isabella | ๐บ | B | MM minutes | C | `cdd4c370` |
| bf_lily | ๐บ | C | MM minutes | D | `6e09c2e4` |
| bm_daniel | ๐น | C | MM minutes | D | `fc3fce4e` |
| bm_fable | ๐น | B | MM minutes | C | `d44935f3` |
| bm_george | ๐น | B | MM minutes | C | `f1bc8122` |
| bm_lewis | ๐น | C | H hours | D+ | `b5204750` |
### Japanese
- `lang_code='j'` in [`misaki[ja]`](https://github.com/hexgrad/misaki)
- Total Japanese training data: H hours
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
| jf_alpha | ๐บ | B | H hours | C+ | `1bf4c9dc` | |
| jf_gongitsune | ๐บ | B | MM minutes | C | `1b171917` | [gongitsune](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__gongitsune.txt) |
| jf_nezumi | ๐บ | B | _M minutes_ ๐ค | C- | `d83f007a` | [nezuminoyomeiri](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__nezuminoyomeiri.txt) |
| jf_tebukuro | ๐บ | B | MM minutes | C | `0d691790` | [tebukurowokaini](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__tebukurowokaini.txt) |
| jm_kumo | ๐น | B | _M minutes_ ๐ค | C- | `98340afd` | [kumonoito](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__kumonoito.txt) |
### Mandarin Chinese
- `lang_code='z'` in [`misaki[zh]`](https://github.com/hexgrad/misaki)
- Total Mandarin Chinese training data: H hours
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| zf_xiaobei | ๐บ | C | MM minutes | D | `9b76be63` |
| zf_xiaoni | ๐บ | C | MM minutes | D | `95b49f16` |
| zf_xiaoxiao | ๐บ | C | MM minutes | D | `cfaf6f2d` |
| zf_xiaoyi | ๐บ | C | MM minutes | D | `b5235dba` |
| zm_yunjian | ๐น | C | MM minutes | D | `76cbf8ba` |
| zm_yunxi | ๐น | C | MM minutes | D | `dbe6e1ce` |
| zm_yunxia | ๐น | C | MM minutes | D | `bb2b03b0` |
| zm_yunyang | ๐น | C | MM minutes | D | `5238ac22` |
### Spanish
- `lang_code='e'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `es`
| Name | Traits | SHA256 |
| ---- | ------ | ------ |
| ef_dora | ๐บ | `d9d69b0f` |
| em_alex | ๐น | `5eac53f7` |
| em_santa | ๐น | `aa8620cb` |
### French
- `lang_code='f'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `fr-fr`
- Total French training data: <11 hours
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
| ff_siwis | ๐บ | B | <11 hours | B- | `8073bf2d` | [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) |
### Hindi
- `lang_code='h'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `hi`
- Total Hindi training data: H hours
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| hf_alpha | ๐บ | B | MM minutes | C | `06906fe0` |
| hf_beta | ๐บ | B | MM minutes | C | `63c0a1a6` |
| hm_omega | ๐น | B | MM minutes | C | `b55f02a8` |
| hm_psi | ๐น | B | MM minutes | C | `2f0f055c` |
### Italian
- `lang_code='i'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `it`
- Total Italian training data: H hours
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| if_sara | ๐บ | B | MM minutes | C | `6c0b253b` |
| im_nicola | ๐น | B | MM minutes | C | `234ed066` |
### Brazilian Portuguese
- `lang_code='p'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `pt-br`
| Name | Traits | SHA256 |
| ---- | ------ | ------ |
| pf_dora | ๐บ | `07e4ff98` |
| pm_alex | ๐น | `cf0ba8c5` |
| pm_santa | ๐น | `d4210316` |
|