Spaces:
Running
A newer version of the Gradio SDK is available:
5.21.0
Neural Source-Filter BigVGAN
Just For FunDataset preparation
Put the dataset into the data_raw directory according to the following file structure
data_raw
ββββspeaker0
β ββββ000001.wav
β ββββ...
β ββββ000xxx.wav
ββββspeaker1
ββββ000001.wav
ββββ...
ββββ000xxx.wav
Install dependencies
1 software dependency
pip install -r requirements.txt
2 download release model, and test
python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_g.pth --wave test.wav
Data preprocessing
1οΌ re-sampling: 32kHz
python prepare/preprocess_a.py -w ./data_raw -o ./data_bigvgan/waves-32k
3οΌ extract pitch
python prepare/preprocess_f0.py -w data_bigvgan/waves-32k/ -p data_bigvgan/pitch
4οΌ extract mel: [100, length]
python prepare/preprocess_spec.py -w data_bigvgan/waves-32k/ -s data_bigvgan/mel
5οΌ generate training index
python prepare/preprocess_train.py
data_bigvgan/
β
βββ waves-32k
β βββ speaker0
β β βββ 000001.wav
β β βββ 000xxx.wav
β βββ speaker1
β βββ 000001.wav
β βββ 000xxx.wav
βββ pitch
β βββ speaker0
β β βββ 000001.pit.npy
β β βββ 000xxx.pit.npy
β βββ speaker1
β βββ 000001.pit.npy
β βββ 000xxx.pit.npy
βββ mel
βββ speaker0
β βββ 000001.mel.pt
β βββ 000xxx.mel.pt
βββ speaker1
βββ 000001.mel.pt
βββ 000xxx.mel.pt
Train
1οΌ start training
python nsf_bigvgan_trainer.py -c configs/nsf_bigvgan.yaml -n nsf_bigvgan
2οΌ resume training
python nsf_bigvgan_trainer.py -c configs/nsf_bigvgan.yaml -n nsf_bigvgan -p chkpt/nsf_bigvgan/***.pth
3οΌ view log
tensorboard --logdir logs/
Inference
1οΌ export inference model
python nsf_bigvgan_export.py --config configs/maxgan.yaml --checkpoint_path chkpt/nsf_bigvgan/***.pt
2οΌ extract mel
python spec/inference.py -w test.wav -m test.mel.pt
3οΌ extract F0
python pitch/inference.py -w test.wav -p test.csv
4οΌ infer
python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_g.pth --wave test.wav
or
python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_g.pth --mel test.mel.pt --pit test.csv
Augmentation of mel
For the over smooth output of acoustic model, we use gaussian blur for mel when train vocoder
# gaussian blur
model_b = get_gaussian_kernel(kernel_size=5, sigma=2, channels=1).to(device)
# mel blur
mel_b = mel[:, None, :, :]
mel_b = model_b(mel_b)
mel_b = torch.squeeze(mel_b, 1)
mel_r = torch.rand(1).to(device) * 0.5
mel_b = (1 - mel_r) * mel_b + mel_r * mel
# generator
optim_g.zero_grad()
fake_audio = model_g(mel_b, pit)
Source of code and References
https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf