ONNX

A SoundStream decoder to reconstruct audio from a mel-spectrogram.

Overview

This model is a SoundStream decoder which inverts mel-spectrograms computed with the specific hyperparameters defined in the example below. This model was trained on music data and used in Multi-instrument Music Synthesis with Spectrogram Diffusion (ISMIR 2022).

A typical use-case is to simplify music generation by predicting mel-spectrograms (instead of a raw waveform), and then use this model to reconstruct audio.

If you use it, please consider citing:

@article{zeghidour2021soundstream,
  title={Soundstream: An end-to-end neural audio codec},
  author={Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={30},
  pages={495--507},
  year={2021},
  publisher={IEEE}
}

Example Use

from diffusers import OnnxRuntimeModel


SAMPLE_RATE = 16000
N_FFT = 1024
HOP_LENGTH = 320
WIN_LENGTH = 640
N_MEL_CHANNELS = 128
MEL_FMIN = 0.0
MEL_FMAX = int(SAMPLE_RATE // 2)
CLIP_VALUE_MIN = 1e-5
CLIP_VALUE_MAX = 1e8

mel = ...

melgan = OnnxRuntimeModel.from_pretrained("kashif/soundstream_mel_decoder")

audio = melgan(input_features=mel.astype(np.float32))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using kashif/soundstream_mel_decoder 4