Spaces:
Sleeping
Sleeping
Commit
·
e0509fb
1
Parent(s):
a9c4ef5
Delete .ipynb_checkpoints
Browse files
.ipynb_checkpoints/demo_part1-checkpoint.ipynb
DELETED
@@ -1,236 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"cells": [
|
3 |
-
{
|
4 |
-
"cell_type": "markdown",
|
5 |
-
"id": "b6ee1ede",
|
6 |
-
"metadata": {},
|
7 |
-
"source": [
|
8 |
-
"## Voice Style Control Demo"
|
9 |
-
]
|
10 |
-
},
|
11 |
-
{
|
12 |
-
"cell_type": "code",
|
13 |
-
"execution_count": null,
|
14 |
-
"id": "b7f043ee",
|
15 |
-
"metadata": {},
|
16 |
-
"outputs": [],
|
17 |
-
"source": [
|
18 |
-
"import os\n",
|
19 |
-
"import torch\n",
|
20 |
-
"import se_extractor\n",
|
21 |
-
"from api import BaseSpeakerTTS, ToneColorConverter"
|
22 |
-
]
|
23 |
-
},
|
24 |
-
{
|
25 |
-
"cell_type": "markdown",
|
26 |
-
"id": "15116b59",
|
27 |
-
"metadata": {},
|
28 |
-
"source": [
|
29 |
-
"### Initialization"
|
30 |
-
]
|
31 |
-
},
|
32 |
-
{
|
33 |
-
"cell_type": "code",
|
34 |
-
"execution_count": null,
|
35 |
-
"id": "aacad912",
|
36 |
-
"metadata": {},
|
37 |
-
"outputs": [],
|
38 |
-
"source": [
|
39 |
-
"ckpt_base = 'checkpoints/base_speakers/EN'\n",
|
40 |
-
"ckpt_converter = 'checkpoints/converter'\n",
|
41 |
-
"device = 'cuda:0'\n",
|
42 |
-
"output_dir = 'outputs'\n",
|
43 |
-
"\n",
|
44 |
-
"base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
|
45 |
-
"base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
|
46 |
-
"\n",
|
47 |
-
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
|
48 |
-
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
|
49 |
-
"\n",
|
50 |
-
"os.makedirs(output_dir, exist_ok=True)"
|
51 |
-
]
|
52 |
-
},
|
53 |
-
{
|
54 |
-
"cell_type": "markdown",
|
55 |
-
"id": "7f67740c",
|
56 |
-
"metadata": {},
|
57 |
-
"source": [
|
58 |
-
"### Obtain Tone Color Embedding"
|
59 |
-
]
|
60 |
-
},
|
61 |
-
{
|
62 |
-
"cell_type": "markdown",
|
63 |
-
"id": "f8add279",
|
64 |
-
"metadata": {},
|
65 |
-
"source": [
|
66 |
-
"The `source_se` is the tone color embedding of the base speaker. \n",
|
67 |
-
"It is an average of multiple sentences generated by the base speaker. We directly provide the result here but\n",
|
68 |
-
"the readers feel free to extract `source_se` by themselves."
|
69 |
-
]
|
70 |
-
},
|
71 |
-
{
|
72 |
-
"cell_type": "code",
|
73 |
-
"execution_count": null,
|
74 |
-
"id": "63ff6273",
|
75 |
-
"metadata": {},
|
76 |
-
"outputs": [],
|
77 |
-
"source": [
|
78 |
-
"source_se = torch.load(f'{ckpt_base}/en_default_se.pth').to(device)"
|
79 |
-
]
|
80 |
-
},
|
81 |
-
{
|
82 |
-
"cell_type": "markdown",
|
83 |
-
"id": "4f71fcc3",
|
84 |
-
"metadata": {},
|
85 |
-
"source": [
|
86 |
-
"The `reference_speaker.mp3` below points to the short audio clip of the reference whose voice we want to clone. We provide an example here. If you use your own reference speakers, please **make sure each speaker has a unique filename.** The `se_extractor` will save the `targeted_se` using the filename of the audio and **will not automatically overwrite.**"
|
87 |
-
]
|
88 |
-
},
|
89 |
-
{
|
90 |
-
"cell_type": "code",
|
91 |
-
"execution_count": null,
|
92 |
-
"id": "55105eae",
|
93 |
-
"metadata": {},
|
94 |
-
"outputs": [],
|
95 |
-
"source": [
|
96 |
-
"reference_speaker = 'resources/example_reference.mp3'\n",
|
97 |
-
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)"
|
98 |
-
]
|
99 |
-
},
|
100 |
-
{
|
101 |
-
"cell_type": "markdown",
|
102 |
-
"id": "a40284aa",
|
103 |
-
"metadata": {},
|
104 |
-
"source": [
|
105 |
-
"### Inference"
|
106 |
-
]
|
107 |
-
},
|
108 |
-
{
|
109 |
-
"cell_type": "code",
|
110 |
-
"execution_count": null,
|
111 |
-
"id": "73dc1259",
|
112 |
-
"metadata": {},
|
113 |
-
"outputs": [],
|
114 |
-
"source": [
|
115 |
-
"save_path = f'{output_dir}/output_en_default.wav'\n",
|
116 |
-
"\n",
|
117 |
-
"# Run the base speaker tts\n",
|
118 |
-
"text = \"This audio is generated by OpenVoice.\"\n",
|
119 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
120 |
-
"base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)\n",
|
121 |
-
"\n",
|
122 |
-
"# Run the tone color converter\n",
|
123 |
-
"encode_message = \"@MyShell\"\n",
|
124 |
-
"tone_color_converter.convert(\n",
|
125 |
-
" audio_src_path=src_path, \n",
|
126 |
-
" src_se=source_se, \n",
|
127 |
-
" tgt_se=target_se, \n",
|
128 |
-
" output_path=save_path,\n",
|
129 |
-
" message=encode_message)"
|
130 |
-
]
|
131 |
-
},
|
132 |
-
{
|
133 |
-
"cell_type": "markdown",
|
134 |
-
"id": "6e3ea28a",
|
135 |
-
"metadata": {},
|
136 |
-
"source": [
|
137 |
-
"**Try with different styles and speed.** The style can be controlled by the `speaker` parameter in the `base_speaker_tts.tts` method. Available choices: friendly, cheerful, excited, sad, angry, terrified, shouting, whispering. Note that the tone color embedding need to be updated. The speed can be controlled by the `speed` parameter. Let's try whispering with speed 0.9."
|
138 |
-
]
|
139 |
-
},
|
140 |
-
{
|
141 |
-
"cell_type": "code",
|
142 |
-
"execution_count": null,
|
143 |
-
"id": "fd022d38",
|
144 |
-
"metadata": {},
|
145 |
-
"outputs": [],
|
146 |
-
"source": [
|
147 |
-
"source_se = torch.load(f'{ckpt_base}/en_style_se.pth').to(device)\n",
|
148 |
-
"save_path = f'{output_dir}/output_whispering.wav'\n",
|
149 |
-
"\n",
|
150 |
-
"# Run the base speaker tts\n",
|
151 |
-
"text = \"This audio is generated by OpenVoice with a half-performance model.\"\n",
|
152 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
153 |
-
"base_speaker_tts.tts(text, src_path, speaker='whispering', language='English', speed=0.9)\n",
|
154 |
-
"\n",
|
155 |
-
"# Run the tone color converter\n",
|
156 |
-
"encode_message = \"@MyShell\"\n",
|
157 |
-
"tone_color_converter.convert(\n",
|
158 |
-
" audio_src_path=src_path, \n",
|
159 |
-
" src_se=source_se, \n",
|
160 |
-
" tgt_se=target_se, \n",
|
161 |
-
" output_path=save_path,\n",
|
162 |
-
" message=encode_message)"
|
163 |
-
]
|
164 |
-
},
|
165 |
-
{
|
166 |
-
"cell_type": "markdown",
|
167 |
-
"id": "5fcfc70b",
|
168 |
-
"metadata": {},
|
169 |
-
"source": [
|
170 |
-
"**Try with different languages.** OpenVoice can achieve multi-lingual voice cloning by simply replace the base speaker. We provide an example with a Chinese base speaker here and we encourage the readers to try `demo_part2.ipynb` for a detailed demo."
|
171 |
-
]
|
172 |
-
},
|
173 |
-
{
|
174 |
-
"cell_type": "code",
|
175 |
-
"execution_count": null,
|
176 |
-
"id": "a71d1387",
|
177 |
-
"metadata": {},
|
178 |
-
"outputs": [],
|
179 |
-
"source": [
|
180 |
-
"\n",
|
181 |
-
"ckpt_base = 'checkpoints/base_speakers/ZH'\n",
|
182 |
-
"base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
|
183 |
-
"base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
|
184 |
-
"\n",
|
185 |
-
"source_se = torch.load(f'{ckpt_base}/zh_default_se.pth').to(device)\n",
|
186 |
-
"save_path = f'{output_dir}/output_chinese.wav'\n",
|
187 |
-
"\n",
|
188 |
-
"# Run the base speaker tts\n",
|
189 |
-
"text = \"今天天气真好,我们一起出去吃饭吧。\"\n",
|
190 |
-
"src_path = f'{output_dir}/tmp.wav'\n",
|
191 |
-
"base_speaker_tts.tts(text, src_path, speaker='default', language='Chinese', speed=1.0)\n",
|
192 |
-
"\n",
|
193 |
-
"# Run the tone color converter\n",
|
194 |
-
"encode_message = \"@MyShell\"\n",
|
195 |
-
"tone_color_converter.convert(\n",
|
196 |
-
" audio_src_path=src_path, \n",
|
197 |
-
" src_se=source_se, \n",
|
198 |
-
" tgt_se=target_se, \n",
|
199 |
-
" output_path=save_path,\n",
|
200 |
-
" message=encode_message)"
|
201 |
-
]
|
202 |
-
},
|
203 |
-
{
|
204 |
-
"cell_type": "markdown",
|
205 |
-
"id": "8e513094",
|
206 |
-
"metadata": {},
|
207 |
-
"source": [
|
208 |
-
"**Tech for good.** For people who will deploy OpenVoice for public usage: We offer you the option to add watermark to avoid potential misuse. Please see the ToneColorConverter class. **MyShell reserves the ability to detect whether an audio is generated by OpenVoice**, no matter whether the watermark is added or not."
|
209 |
-
]
|
210 |
-
}
|
211 |
-
],
|
212 |
-
"metadata": {
|
213 |
-
"interpreter": {
|
214 |
-
"hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35"
|
215 |
-
},
|
216 |
-
"kernelspec": {
|
217 |
-
"display_name": "Python 3.9.18 ('openvoice')",
|
218 |
-
"language": "python",
|
219 |
-
"name": "python3"
|
220 |
-
},
|
221 |
-
"language_info": {
|
222 |
-
"codemirror_mode": {
|
223 |
-
"name": "ipython",
|
224 |
-
"version": 3
|
225 |
-
},
|
226 |
-
"file_extension": ".py",
|
227 |
-
"mimetype": "text/x-python",
|
228 |
-
"name": "python",
|
229 |
-
"nbconvert_exporter": "python",
|
230 |
-
"pygments_lexer": "ipython3",
|
231 |
-
"version": "3.9.18"
|
232 |
-
}
|
233 |
-
},
|
234 |
-
"nbformat": 4,
|
235 |
-
"nbformat_minor": 5
|
236 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.ipynb_checkpoints/requirements-checkpoint.txt
DELETED
@@ -1,14 +0,0 @@
|
|
1 |
-
librosa==0.9.1
|
2 |
-
faster-whisper==0.9.0
|
3 |
-
pydub==0.25.1
|
4 |
-
wavmark==0.0.2
|
5 |
-
numpy==1.22.0
|
6 |
-
eng_to_ipa==0.0.2
|
7 |
-
inflect==7.0.0
|
8 |
-
unidecode==1.3.7
|
9 |
-
whisper-timestamped==1.14.2
|
10 |
-
openai
|
11 |
-
python-dotenv
|
12 |
-
pypinyin
|
13 |
-
jieba
|
14 |
-
cn2an
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|