Create README.md
#1
by
Respair
- opened
README.md
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- vocoder
|
5 |
+
- audio
|
6 |
+
- speech
|
7 |
+
- tts
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model Card for Model ID
|
11 |
+
|
12 |
+
This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
|
13 |
+
This repository is experimental, expect some bugs and some hardcoded params.
|
14 |
+
|
15 |
+
The default setting is 44.1khz - 128 Mel bin. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.
|
16 |
+
|
17 |
+
Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main)
|