cckm commited on
Commit
1ce7776
·
verified ·
1 Parent(s): 7fa53dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/nvidia/BigVGAN/blob/main/LICENSE
4
+ tags:
5
+ - neural-vocoder
6
+ - audio-generation
7
+ library_name: PyTorch
8
+ pipeline_tag: audio-to-audio
9
+ ---
10
+
11
+ ## BigVGAN with different mel spectrogram input
12
+ These BigVGAN checkpoints are from continued training of https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x, with the input mel spectrogram generated from this code from [[vocos]](https://github.com/gemelo-ai/vocos/blob/c859e3b7b534f3776a357983029d34170ddd6fc3/vocos/feature_extractors.py#L28C1-L49C24):
13
+
14
+ ```py
15
+ class MelSpectrogramFeatures(FeatureExtractor):
16
+ def __init__(self, sample_rate=24000, n_fft=1024, hop_length=256, n_mels=100, padding="center"):
17
+ super().__init__()
18
+ if padding not in ["center", "same"]:
19
+ raise ValueError("Padding must be 'center' or 'same'.")
20
+ self.padding = padding
21
+ self.mel_spec = torchaudio.transforms.MelSpectrogram(
22
+ sample_rate=sample_rate,
23
+ n_fft=n_fft,
24
+ hop_length=hop_length,
25
+ n_mels=n_mels,
26
+ center=padding == "center",
27
+ power=1,
28
+ )
29
+
30
+ def forward(self, audio, **kwargs):
31
+ if self.padding == "same":
32
+ pad = self.mel_spec.win_length - self.mel_spec.hop_length
33
+ audio = torch.nn.functional.pad(audio, (pad // 2, pad // 2), mode="reflect")
34
+ mel = self.mel_spec(audio)
35
+ features = safe_log(mel)
36
+ return features
37
+ ```
38
+
39
+ Training was done with segment_size=65536 (unchanged) and batch_size=24 (vs 32 from the Nvidia team). Final eval PESQ is 4.340 (vs 4.362 from the Nvidia checkpoint, on their own mel spectrogram code).
40
+
41
+ <center><img src="https://huggingface.co/cckm/bigvgan_melspec/resolve/main/assets/bigvgan_pesq.png" width="800"></center>