danieloneill
/

ddsp-svc-samplemodels-en

Audio-to-Audio

English

voice-to-voice

ddsp-svc

Model card Files Files and versions Community

danieloneill commited on Nov 12, 2023

Commit

bb9f357

1 Parent(s): 39fb6a3

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -11

README.md CHANGED Viewed

@@ -8,29 +8,47 @@ tags:
 - ddsp-svc
 ---
-These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
 All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
 All models are sampled at 44.1khz
-- PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
-- Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
-- V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
-- Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"
-If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
 For realtime inference, my settings are generally as follows:
 - Pitch: 10 - 15 depending on model
-- Segmentation Size: 0.70
-- Cross fade duration: 0.06
-- Historical blocks used: 6
 - f0Extractor: rmvpe
-- Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
 - K-steps: 200
 - Speedup: 10
 - Diffusion method: ddim or pndm, depending on model
-- Encode silence: Depends on the model and preference, might be best on, might be best off.

 - ddsp-svc
 ---
+These are a few test models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
+I am not experienced with this software or technology,
 All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
 All models are sampled at 44.1khz
+To use these, place the model file (model_XXXXXX.pt) and configuration file (config.yaml) in a directory.
+**It's rather important to mention that each model file should be in a distinct directory with its accompanying config.yaml or your results may be off/weird/broken.**
+Models:
+- PrimReaper - (Stereo) Trained on YouTube content from popular YouTuber "The Prim Reaper"
+- Panam - (Mono) Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
+- V-F - (Mono) Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
+- Nora - (Mono) Trained on Fallout 4 dialogue audio from the game character "Nora"
+If using DDSP-SVC's **gui_diff.py**, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
+Training is done following the suggestions and best practices according to the DDSP-SVC project, and K-steps are between 100 and 200.
 For realtime inference, my settings are generally as follows:
+**Normal Settings**
+- Speaker ID: Always "1"
+- Response Threshold: -45 (This is mic specific)
 - Pitch: 10 - 15 depending on model
+- Sampling rate: Always 44100 for my models
+- Mix Speaker: All models are single-speaker, so this is **not** unchecked
+**Performance Settings**
+- Segmentation Size: 0.45
+- Cross fade duration: 0.07
+- Historical blocks used: 8
 - f0Extractor: rmvpe
+- Phase vocoder: Depending on the model I enable it if model output feels robotic/stuttery, and disable if it sounds "buttery"
+**Diffusion Settings**
 - K-steps: 200
 - Speedup: 10
 - Diffusion method: ddim or pndm, depending on model
+- Encode silence: Depends on the model, but usually "on" for the best quality