GlowTTS + HifiGAN Female Belarusian Voice #1

This is my first attempt at training a Belarusian voice using Coqui TTS and Mozilla's CommonVoice dataset. This model was developed based on the excellent recipe provided by bel-alex73. For this particular model, I tweaked the search results to find single speakers with over 30 hours of audio and selected speakers based on clarity and relatively slow speaking cadence. This was a manual selection process that involved me tweaking bel-alex73 choose_speaker.ipynb notebook to show/process more that just the top ranked speaker.

This model is generated from the following client_id: 216de8fc1b7973a11926dd6694d2a97c3ceaf5a626ec4c8d2c85c8140a10ec5ed59bd6ee756c8c3451ee0cf784e4af445748cd69a2936102489b95f3409cd0d7

I am not a native speaker of Belarusian and I am doing this to assist in my language learning efforts. I am open to any and all feedback (esp. from native speakers) so feel free to post questions/comments.

Sythesizing text to speech

Input text needs to be phoneme-ized in order for this model to process the speech correctly. This process has been documented in bel-alex73's README.

tts --text "<phonemes>" --out_path output.wav \
    --config_path config.json \
    --model_path best_model.pth \
    --vocoder_config_path vocoder_config.json \
    --vocoder_path vocoder_best_model.pth
Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train slapula/commonvoice_be_tts_female_1