BigVGAN-L

The 24kHz model was pretrained using LibriTTS dataset with a full 100-band mel spectrogram as input (see config.json for the exact hyperparameter setup) with the BigVGAN repository. The pretraining was performed over 1300k steps with a 100 batch size with 8 A100 40GB GPUs.

Inference

The run the inference with the example command for generating audio from the model. It computes mel spectrograms using wav files from --input_wavs_dir and saves the generated audio to --output_dir.

python NEMO_PATH/inference.py \
--checkpoint_file MODEL_PATH/BigVGAN-L/g_01300000.pt \
--input_wavs_dir AUDIO_PATH/input_wav \
--output_dir AUDIO_PATH/output_wav

Continual finetuning

The vocoder can be finetuned further on using the NEMO_PATH/train.py script as the checkpoints save all the optimizer information.

Downloads last month
26
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.