Commit
·
bb9f357
1
Parent(s):
39fb6a3
Update README.md
Browse files
README.md
CHANGED
@@ -8,29 +8,47 @@ tags:
|
|
8 |
- ddsp-svc
|
9 |
---
|
10 |
|
11 |
-
These are
|
|
|
|
|
12 |
|
13 |
All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
|
14 |
|
15 |
All models are sampled at 44.1khz
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
If using DDSP-SVC's
|
|
|
|
|
23 |
|
24 |
For realtime inference, my settings are generally as follows:
|
25 |
|
|
|
|
|
|
|
26 |
- Pitch: 10 - 15 depending on model
|
27 |
-
-
|
28 |
-
-
|
29 |
-
|
|
|
|
|
|
|
|
|
30 |
- f0Extractor: rmvpe
|
31 |
-
- Phase vocoder: Depending on model
|
|
|
|
|
32 |
- K-steps: 200
|
33 |
- Speedup: 10
|
34 |
- Diffusion method: ddim or pndm, depending on model
|
35 |
-
- Encode silence: Depends on the model
|
36 |
|
|
|
8 |
- ddsp-svc
|
9 |
---
|
10 |
|
11 |
+
These are a few test models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
|
12 |
+
|
13 |
+
I am not experienced with this software or technology,
|
14 |
|
15 |
All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
|
16 |
|
17 |
All models are sampled at 44.1khz
|
18 |
|
19 |
+
To use these, place the model file (model_XXXXXX.pt) and configuration file (config.yaml) in a directory.
|
20 |
+
|
21 |
+
**It's rather important to mention that each model file should be in a distinct directory with its accompanying config.yaml or your results may be off/weird/broken.**
|
22 |
+
|
23 |
+
Models:
|
24 |
+
- PrimReaper - (Stereo) Trained on YouTube content from popular YouTuber "The Prim Reaper"
|
25 |
+
- Panam - (Mono) Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
|
26 |
+
- V-F - (Mono) Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
|
27 |
+
- Nora - (Mono) Trained on Fallout 4 dialogue audio from the game character "Nora"
|
28 |
|
29 |
+
If using DDSP-SVC's **gui_diff.py**, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
|
30 |
+
|
31 |
+
Training is done following the suggestions and best practices according to the DDSP-SVC project, and K-steps are between 100 and 200.
|
32 |
|
33 |
For realtime inference, my settings are generally as follows:
|
34 |
|
35 |
+
**Normal Settings**
|
36 |
+
- Speaker ID: Always "1"
|
37 |
+
- Response Threshold: -45 (This is mic specific)
|
38 |
- Pitch: 10 - 15 depending on model
|
39 |
+
- Sampling rate: Always 44100 for my models
|
40 |
+
- Mix Speaker: All models are single-speaker, so this is **not** unchecked
|
41 |
+
|
42 |
+
**Performance Settings**
|
43 |
+
- Segmentation Size: 0.45
|
44 |
+
- Cross fade duration: 0.07
|
45 |
+
- Historical blocks used: 8
|
46 |
- f0Extractor: rmvpe
|
47 |
+
- Phase vocoder: Depending on the model I enable it if model output feels robotic/stuttery, and disable if it sounds "buttery"
|
48 |
+
|
49 |
+
**Diffusion Settings**
|
50 |
- K-steps: 200
|
51 |
- Speedup: 10
|
52 |
- Diffusion method: ddim or pndm, depending on model
|
53 |
+
- Encode silence: Depends on the model, but usually "on" for the best quality
|
54 |
|