Tzktz's picture
Upload 7664 files
6fc683c verified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

Unit to Speech Model (unit2speech)

Unit to speech model is modified Tacotron2 model that learns to synthesize speech from discrete speech units. All models are trained on quantized LJSpeech.

Upstream Units Download Links model md5
Log Mel Filterbank + KM50 model - code_dict 932b3b8527c0125f5f964b57762eba49
Log Mel Filterbank + KM100 model - code_dict cde0b0d278a39011d0acbd5df27abdf4
Log Mel Filterbank + KM200 model - code_dict dba0f1d4de64bc7976718834010b23e7
Modified CPC + KM50 model - code_dict a585e8dd8890ea56164f17635dd8e613
Modified CPC + KM100 model - code_dict 5c0ee2869b4f483d17f37f1a41a548e0
Modified CPC + KM200 model - code_dict 2f0c9951cf37020d9464514bff48bc5d
HuBERT Base + KM50 model - code_dict 85ffce8baec5aa90035ab696fe676fce
HuBERT Base + KM100 model - code_dict df4a9c6ffd1bb00c91405432c234aba3
HuBERT Base + KM200 model - code_dict ac72f2c0c563589819bec116c7f8d274
wav2vec 2.0 Large + KM50 model - code_dict e3503d0ad822b2c24b89f68b857fedff
wav2vec 2.0 Large + KM100 model - code_dict eb3666e456ae4c96bf2a1eec825c13ed
wav2vec 2.0 Large + KM200 model - code_dict 777d343e963c4d64f04d78eef032f4e8

Run inference using a unit2speech model

  • Install librosa, unidecode and inflect using pip install librosa, unidecode, inflect
  • Download Waveglow checkpoint. This is the vocoder.

Sample commnd to run inference using trained unit2speech models. Please note that the quantized audio to synthesized should be using the same units as the unit2speech model was trained with.

FAIRSEQ_ROOT=<path_to_your_fairseq_repo_root>
TTS_MODEL_PATH=<unit2speech_model_file_path>
QUANTIZED_UNIT_PATH=<quantized_audio_file_path>
OUT_DIR=<dir_to_dump_synthesized_audio_files>
WAVEGLOW_PATH=<path_where_you_have_downloaded_waveglow_checkpoint>
CODE_DICT_PATH=<unit2speech_code_dict_path>

PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech/synthesize_audio_from_units.py \
    --tts_model_path $TTS_MODEL_PATH \
    --quantized_unit_path $QUANTIZED_UNIT_PATH \
    --out_audio_dir $OUT_DIR \
    --waveglow_path  $WAVEGLOW_PATH \
    --code_dict_path $CODE_DICT_PATH \
    --max_decoder_steps 2000