Text-to-Speech
English
File size: 7,625 Bytes
c6cb309
 
c2ad57f
32a55f8
5680cdc
 
 
32a55f8
 
 
5680cdc
8d75ee2
c6cb309
 
aa89b69
c6cb309
8d75ee2
 
 
 
 
 
c6cb309
 
 
 
 
 
a4fb0ea
938257c
 
5680cdc
c6cb309
8d75ee2
c6cb309
32a55f8
 
aa89b69
b118d26
 
c2ad57f
b118d26
 
 
 
 
 
 
 
 
5680cdc
b118d26
 
 
 
 
 
 
 
5680cdc
c6cb309
8d75ee2
c6cb309
32a55f8
 
aa89b69
b118d26
 
 
 
 
 
 
 
 
 
aa89b69
5680cdc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d75ee2
aa89b69
32a55f8
 
 
aa89b69
b118d26
 
 
446dd3f
8d75ee2
446dd3f
32a55f8
 
 
446dd3f
b118d26
 
 
 
 
 
446dd3f
8d75ee2
09f3623
32a55f8
 
 
09f3623
 
 
 
 
 
5680cdc
c03b18b
5680cdc
 
c03b18b
5680cdc
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# Voices

- ๐Ÿ‡บ๐Ÿ‡ธ [American English](#american-english): 11F 9M
- ๐Ÿ‡ฌ๐Ÿ‡ง [British English](#british-english): 4F 4M
- ๐Ÿ‡ฏ๐Ÿ‡ต [Japanese](#japanese): 4F 1M
- ๐Ÿ‡จ๐Ÿ‡ณ [Mandarin Chinese](#mandarin-chinese): 4F 4M
- ๐Ÿ‡ช๐Ÿ‡ธ [Spanish](#spanish): 1F 2M
- ๐Ÿ‡ซ๐Ÿ‡ท [French](#french): 1F
- ๐Ÿ‡ฎ๐Ÿ‡ณ [Hindi](#hindi): 2F 2M
- ๐Ÿ‡ฎ๐Ÿ‡น [Italian](#italian): 1F 1M
- ๐Ÿ‡ง๐Ÿ‡ท [Brazilian Portuguese](#brazilian-portuguese): 1F 2M

For each voice, the given grades are intended to be estimates of the **quality and quantity** of its associated training data, both of which impact overall inference quality.

Subjectively, voices will sound better or worse to different people.

Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).

Most voices perform best on a "goldilocks range" of 100-200 tokens out of ~500 possible. Voices may perform worse at the extremes:
- **Weakness** on short utterances, especially less than 10-20 tokens. Root cause could be lack of short-utterance training data and/or model architecture. One possible inference mitigation is to bundle shorter utterances together.
- **Rushing** on long utterances, especially over 400 tokens. You can chunk down to shorter utterances or adjust the `speed` parameter to mitigate this.

**Target Quality**
- How high quality is the reference voice? This grade may be impacted by audio quality, artifacts, compression, & sample rate.
- How well do the text labels match the audio? Text/audio misalignment (e.g. from hallucinations) will lower this grade.

**Training Duration**
- How much audio was seen during training? Smaller durations result in a lower overall grade.
- 10 hours <= **HH hours** < 100 hours
- 1 hour <= H hours < 10 hours
- 10 minutes <= MM minutes < 100 minutes
- 1 minute <= _M minutes_ ๐Ÿค < 10 minutes

### American English

- `lang_code='a'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `en-us` fallback

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| **af\_heart** | ๐Ÿšบโค๏ธ | | | **A** | `0ab5709b` |
| af_alloy | ๐Ÿšบ | B | MM minutes | C | `6d877149` |
| af_aoede | ๐Ÿšบ | B | H hours | C+ | `c03bd1a4` |
| af_bella | ๐Ÿšบ๐Ÿ”ฅ | **A** | **HH hours** | **A-** | `8cb64e02` |
| af_jessica | ๐Ÿšบ | C | MM minutes | D | `cdfdccb8` |
| af_kore | ๐Ÿšบ | B | H hours | C+ | `8bfbc512` |
| af_nicole | ๐Ÿšบ๐ŸŽง | B | **HH hours** | B- | `c5561808` |
| af_nova | ๐Ÿšบ | B | MM minutes | C | `e0233676` |
| af_river | ๐Ÿšบ | C | MM minutes | D | `e149459b` |
| af_sarah | ๐Ÿšบ | B | H hours | C+ | `49bd364e` |
| af_sky | ๐Ÿšบ | B | _M minutes_ ๐Ÿค | C- | `c799548a` |
| am_adam | ๐Ÿšน | D | H hours | F+ | `ced7e284` |
| am_echo | ๐Ÿšน | C | MM minutes | D | `8bcfdc85` |
| am_eric | ๐Ÿšน | C | MM minutes | D | `ada66f0e` |
| am_fenrir | ๐Ÿšน | B | H hours | C+ | `98e507ec` |
| am_liam | ๐Ÿšน | C | MM minutes | D | `c8255075` |
| am_michael | ๐Ÿšน | B | H hours | C+ | `9a443b79` |
| am_onyx | ๐Ÿšน | C | MM minutes | D | `e8452be1` |
| am_puck | ๐Ÿšน | B | H hours | C+ | `dd1d8973` |
| am_santa | ๐Ÿšน | C | _M minutes_ ๐Ÿค | D- | `7f2f7582` |

### British English

- `lang_code='b'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `en-gb` fallback

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| bf_alice | ๐Ÿšบ | C | MM minutes | D | `d292651b` |
| bf_emma | ๐Ÿšบ | B | **HH hours** | B- | `d0a423de` |
| bf_isabella | ๐Ÿšบ | B | MM minutes | C | `cdd4c370` |
| bf_lily | ๐Ÿšบ | C | MM minutes | D | `6e09c2e4` |
| bm_daniel | ๐Ÿšน | C | MM minutes | D | `fc3fce4e` |
| bm_fable | ๐Ÿšน | B | MM minutes | C | `d44935f3` |
| bm_george | ๐Ÿšน | B | MM minutes | C | `f1bc8122` |
| bm_lewis | ๐Ÿšน | C | H hours | D+ | `b5204750` |

### Japanese

- `lang_code='j'` in [`misaki[ja]`](https://github.com/hexgrad/misaki)
- Total Japanese training data: H hours

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
| jf_alpha | ๐Ÿšบ | B | H hours | C+ | `1bf4c9dc` | |
| jf_gongitsune | ๐Ÿšบ | B | MM minutes | C | `1b171917` | [gongitsune](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__gongitsune.txt) |
| jf_nezumi | ๐Ÿšบ | B | _M minutes_ ๐Ÿค | C- | `d83f007a` | [nezuminoyomeiri](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__nezuminoyomeiri.txt) |
| jf_tebukuro | ๐Ÿšบ | B | MM minutes | C | `0d691790` | [tebukurowokaini](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__tebukurowokaini.txt) |
| jm_kumo | ๐Ÿšน | B | _M minutes_ ๐Ÿค | C- | `98340afd` | [kumonoito](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__kumonoito.txt) |

### Mandarin Chinese

- `lang_code='z'` in [`misaki[zh]`](https://github.com/hexgrad/misaki)
- Total Mandarin Chinese training data: H hours

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| zf_xiaobei | ๐Ÿšบ | C | MM minutes | D | `9b76be63` |
| zf_xiaoni | ๐Ÿšบ | C | MM minutes | D | `95b49f16` |
| zf_xiaoxiao | ๐Ÿšบ | C | MM minutes | D | `cfaf6f2d` |
| zf_xiaoyi | ๐Ÿšบ | C | MM minutes | D | `b5235dba` |
| zm_yunjian | ๐Ÿšน | C | MM minutes | D | `76cbf8ba` |
| zm_yunxi | ๐Ÿšน | C | MM minutes | D | `dbe6e1ce` |
| zm_yunxia | ๐Ÿšน | C | MM minutes | D | `bb2b03b0` |
| zm_yunyang | ๐Ÿšน | C | MM minutes | D | `5238ac22` |

### Spanish

- `lang_code='e'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `es`

| Name | Traits | SHA256 |
| ---- | ------ | ------ |
| ef_dora | ๐Ÿšบ | `d9d69b0f` |
| em_alex | ๐Ÿšน | `5eac53f7` |
| em_santa | ๐Ÿšน | `aa8620cb` |

### French

- `lang_code='f'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `fr-fr`
- Total French training data: <11 hours

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
| ff_siwis | ๐Ÿšบ | B | <11 hours | B- | `8073bf2d` | [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) |

### Hindi

- `lang_code='h'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `hi`
- Total Hindi training data: H hours

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| hf_alpha | ๐Ÿšบ | B | MM minutes | C | `06906fe0` |
| hf_beta | ๐Ÿšบ | B | MM minutes | C | `63c0a1a6` |
| hm_omega | ๐Ÿšน | B | MM minutes | C | `b55f02a8` |
| hm_psi | ๐Ÿšน | B | MM minutes | C | `2f0f055c` |

### Italian

- `lang_code='i'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `it`
- Total Italian training data: H hours

| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
| if_sara | ๐Ÿšบ | B | MM minutes | C | `6c0b253b` |
| im_nicola | ๐Ÿšน | B | MM minutes | C | `234ed066` |

### Brazilian Portuguese

- `lang_code='p'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
- espeak-ng `pt-br`

| Name | Traits | SHA256 |
| ---- | ------ | ------ |
| pf_dora | ๐Ÿšบ | `07e4ff98` |
| pm_alex | ๐Ÿšน | `cf0ba8c5` |
| pm_santa | ๐Ÿšน | `d4210316` |