This FastPitch[1] model was trained on the HUI-Audio-Corpus-German[2] clean dataset using the Nemo Toolkit[3]. We selected 5 speakers who have the 5-largest amount of data and balanced training data across speakers (around 20 hours per speaker).

This a retrained model of: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_de_fastpitch_multispeaker_5

How to Use:

Use with Nemo Toolkit version 1.14.0

  # Load spectrogram generator
  from nemo.collections.tts.models import FastPitchModel
  spec_generator = FastPitchModel.restore_from("path/to/model.nemo")
  
  # Load Vocoder
  from nemo.collections.tts.models import HifiGanModel
  model = HifiGanModel.from_pretrained(model_name="tts_de_hui_hifigan_ft_fastpitch_multispeaker_5")
  
  # Generate audio
  import torchaudio
  parsed = spec_generator.parse("")
  speaker_id = 0
  spectrogram = spec_generator.generate_spectrogram(tokens=parsed, speaker=speaker_id)
  audio = model.convert_spectrogram_to_audio(spec=spectrogram)
  
  # Save the audio to disk in a file called speech.wav
  torchaudio.save('german_speech.wav', audio.cpu(), 44100)   

[1] FastPitch: Parallel Text-to-speech with Pitch Prediction: https://arxiv.org/abs/2006.06873 [2] HUI-Audio-Corpus-German Dataset: https://opendata.iisys.de/datasets.html [3] NVIDIA NeMo Toolkit: https://github.com/NVIDIA/NeMo

Downloads last month
23
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-to-speech models for nemo library.

Space using inOXcrm/German_multispeaker_FastPitch_nemo 1