danieloneill commited on
Commit
bb9f357
·
1 Parent(s): 39fb6a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -11
README.md CHANGED
@@ -8,29 +8,47 @@ tags:
8
  - ddsp-svc
9
  ---
10
 
11
- These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
 
 
12
 
13
  All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
14
 
15
  All models are sampled at 44.1khz
16
 
17
- - PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
18
- - Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
19
- - V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
20
- - Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"
 
 
 
 
 
21
 
22
- If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
 
 
23
 
24
  For realtime inference, my settings are generally as follows:
25
 
 
 
 
26
  - Pitch: 10 - 15 depending on model
27
- - Segmentation Size: 0.70
28
- - Cross fade duration: 0.06
29
- - Historical blocks used: 6
 
 
 
 
30
  - f0Extractor: rmvpe
31
- - Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
 
 
32
  - K-steps: 200
33
  - Speedup: 10
34
  - Diffusion method: ddim or pndm, depending on model
35
- - Encode silence: Depends on the model and preference, might be best on, might be best off.
36
 
 
8
  - ddsp-svc
9
  ---
10
 
11
+ These are a few test models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
12
+
13
+ I am not experienced with this software or technology,
14
 
15
  All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
16
 
17
  All models are sampled at 44.1khz
18
 
19
+ To use these, place the model file (model_XXXXXX.pt) and configuration file (config.yaml) in a directory.
20
+
21
+ **It's rather important to mention that each model file should be in a distinct directory with its accompanying config.yaml or your results may be off/weird/broken.**
22
+
23
+ Models:
24
+ - PrimReaper - (Stereo) Trained on YouTube content from popular YouTuber "The Prim Reaper"
25
+ - Panam - (Mono) Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
26
+ - V-F - (Mono) Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
27
+ - Nora - (Mono) Trained on Fallout 4 dialogue audio from the game character "Nora"
28
 
29
+ If using DDSP-SVC's **gui_diff.py**, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
30
+
31
+ Training is done following the suggestions and best practices according to the DDSP-SVC project, and K-steps are between 100 and 200.
32
 
33
  For realtime inference, my settings are generally as follows:
34
 
35
+ **Normal Settings**
36
+ - Speaker ID: Always "1"
37
+ - Response Threshold: -45 (This is mic specific)
38
  - Pitch: 10 - 15 depending on model
39
+ - Sampling rate: Always 44100 for my models
40
+ - Mix Speaker: All models are single-speaker, so this is **not** unchecked
41
+
42
+ **Performance Settings**
43
+ - Segmentation Size: 0.45
44
+ - Cross fade duration: 0.07
45
+ - Historical blocks used: 8
46
  - f0Extractor: rmvpe
47
+ - Phase vocoder: Depending on the model I enable it if model output feels robotic/stuttery, and disable if it sounds "buttery"
48
+
49
+ **Diffusion Settings**
50
  - K-steps: 200
51
  - Speedup: 10
52
  - Diffusion method: ddim or pndm, depending on model
53
+ - Encode silence: Depends on the model, but usually "on" for the best quality
54