|
# Overflow TTS |
|
|
|
Neural HMMs are a type of neural transducer recently proposed for |
|
sequence-to-sequence modelling in text-to-speech. They combine the best features |
|
of classic statistical speech synthesis and modern neural TTS, requiring less |
|
data and fewer training updates, and are less prone to gibberish output caused |
|
by neural attention failures. In this paper, we combine neural HMM TTS with |
|
normalising flows for describing the highly non-Gaussian distribution of speech |
|
acoustics. The result is a powerful, fully probabilistic model of durations and |
|
acoustics that can be trained using exact maximum likelihood. Compared to |
|
dominant flow-based acoustic models, our approach integrates autoregression for |
|
improved modelling of long-range dependences such as utterance-level prosody. |
|
Experiments show that a system based on our proposal gives more accurate |
|
pronunciations and better subjective speech quality than comparable methods, |
|
whilst retaining the original advantages of neural HMMs. Audio examples and code |
|
are available at https://shivammehta25.github.io/OverFlow/. |
|
|
|
|
|
## Important resources & papers |
|
- HMM: https://de.wikipedia.org/wiki/Hidden_Markov_Model |
|
- OverflowTTS paper: https://arxiv.org/abs/2211.06892 |
|
- Neural HMM: https://arxiv.org/abs/2108.13320 |
|
- Audio Samples: https://shivammehta25.github.io/OverFlow/ |
|
|
|
|
|
## OverflowConfig |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.configs.overflow_config.OverflowConfig |
|
:members: |
|
``` |
|
|
|
## Overflow Model |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.models.overflow.Overflow |
|
:members: |
|
``` |