feat: update tech report
Browse files- index.html +15 -27
index.html
CHANGED
|
@@ -57,10 +57,9 @@
|
|
| 57 |
control
|
| 58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
| 59 |
voice
|
| 60 |
-
cloning (PVC) by fine-tuning timbre features with additional data.
|
| 61 |
-
<a
|
| 62 |
-
|
| 63 |
-
for more examples.
|
| 64 |
</p>
|
| 65 |
</div>
|
| 66 |
|
|
@@ -233,23 +232,21 @@
|
|
| 233 |
features based
|
| 234 |
on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
|
| 235 |
rate,
|
| 236 |
-
emotions, etc.) demonstrated in the audio prompt
|
|
|
|
|
|
|
| 237 |
</p>
|
| 238 |
<div class="scroll-wrapper" style="margin-top: 2rem;">
|
| 239 |
<table style="width: 100%;">
|
| 240 |
<tbody>
|
| 241 |
<tr class="border-bottom-thin">
|
| 242 |
<th scope="col">Source Audio</th>
|
| 243 |
-
<th scope="col">Prompt</th>
|
| 244 |
<th scope="col">Text</th>
|
| 245 |
<th scope="col">Zero-Shot Version</th>
|
| 246 |
<th scope="col">One-Shot Version</th>
|
| 247 |
<th scope="col">Elevenlabs Multilingual_v2</th>
|
| 248 |
</tr>
|
| 249 |
<tr class="border-bottom-thin">
|
| 250 |
-
<th>
|
| 251 |
-
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Source.WAV" controls></audio>
|
| 252 |
-
</th>
|
| 253 |
<td>
|
| 254 |
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
|
| 255 |
</td>
|
|
@@ -280,9 +277,6 @@
|
|
| 280 |
</td>
|
| 281 |
</tr>
|
| 282 |
<tr class="border-bottom-thin">
|
| 283 |
-
<th>
|
| 284 |
-
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Source.WAV" controls></audio>
|
| 285 |
-
</th>
|
| 286 |
<td>
|
| 287 |
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
|
| 288 |
</td>
|
|
@@ -317,9 +311,6 @@
|
|
| 317 |
</td>
|
| 318 |
</tr>
|
| 319 |
<tr class="border-bottom-thin">
|
| 320 |
-
<th>
|
| 321 |
-
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English.MP3" controls></audio>
|
| 322 |
-
</th>
|
| 323 |
<td>
|
| 324 |
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
|
| 325 |
</td>
|
|
@@ -346,9 +337,6 @@
|
|
| 346 |
</td>
|
| 347 |
</tr>
|
| 348 |
<tr>
|
| 349 |
-
<th>
|
| 350 |
-
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English.MP3" controls></audio>
|
| 351 |
-
</th>
|
| 352 |
<td>
|
| 353 |
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
|
| 354 |
</td>
|
|
@@ -398,7 +386,7 @@
|
|
| 398 |
<th scope="col">Languages</th>
|
| 399 |
<th scope="col">Source Audio</th>
|
| 400 |
<th scope="col">Text</th>
|
| 401 |
-
<th scope="col">
|
| 402 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
| 403 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
| 404 |
</tr>
|
|
@@ -519,19 +507,19 @@
|
|
| 519 |
<tbody>
|
| 520 |
<tr class="border-bottom-thin">
|
| 521 |
<th scope="col">Original Language</th>
|
| 522 |
-
<th scope="col">Mixed Language</th>
|
| 523 |
<th scope="col">Source Audio</th>
|
|
|
|
| 524 |
<th scope="col">Text</th>
|
| 525 |
-
<th scope="col">
|
| 526 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
| 527 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
| 528 |
</tr>
|
| 529 |
<tr class="border-bottom-thin">
|
| 530 |
<td>English</td>
|
| 531 |
-
<td>English + Mandarin</td>
|
| 532 |
<td>
|
| 533 |
<audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
|
| 534 |
</td>
|
|
|
|
| 535 |
<td>
|
| 536 |
Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
|
| 537 |
I see you're using AI tools already - so smart!<br>
|
|
@@ -551,10 +539,10 @@
|
|
| 551 |
</tr>
|
| 552 |
<tr class="border-bottom-thin">
|
| 553 |
<td>Mandarin</td>
|
| 554 |
-
<td>Mandarin + Cantonese</td>
|
| 555 |
<td>
|
| 556 |
<audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
|
| 557 |
</td>
|
|
|
|
| 558 |
<td>
|
| 559 |
老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
|
| 560 |
我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
|
|
@@ -572,10 +560,10 @@
|
|
| 572 |
</tr>
|
| 573 |
<tr class="border-bottom-thin">
|
| 574 |
<td>Mandarin</td>
|
| 575 |
-
<td>Mandarin + English</td>
|
| 576 |
<td>
|
| 577 |
<audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
|
| 578 |
</td>
|
|
|
|
| 579 |
<td>
|
| 580 |
The people said, 桂林's scenery is the first under heaven.<br>
|
| 581 |
Yet in my opinion, 阳朔 scenery is better than ��林。<br>
|
|
@@ -593,10 +581,10 @@
|
|
| 593 |
</tr>
|
| 594 |
<tr class="border-bottom-thin">
|
| 595 |
<td>English</td>
|
| 596 |
-
<td>English + Spanish</td>
|
| 597 |
<td>
|
| 598 |
<audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
|
| 599 |
</td>
|
|
|
|
| 600 |
<td>
|
| 601 |
Mi abuelita always told me "el que persevera, alcanza".<br>
|
| 602 |
If you persevere, you'll achieve your dreams!<br>
|
|
@@ -614,10 +602,10 @@
|
|
| 614 |
</tr>
|
| 615 |
<tr class="border-bottom-thin">
|
| 616 |
<td>Japanese</td>
|
| 617 |
-
<td>Japanese + Korean</td>
|
| 618 |
<td>
|
| 619 |
<audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
|
| 620 |
</td>
|
|
|
|
| 621 |
<td>
|
| 622 |
最近の天気予報によりますと、今週末は桜の開花に最適<br>
|
| 623 |
な気温になる予定です。<br>
|
|
@@ -996,7 +984,7 @@
|
|
| 996 |
<tbody>
|
| 997 |
<tr class="border-bottom-thin">
|
| 998 |
<th scope="col">Text</th>
|
| 999 |
-
<th scope="col" style="text-align: center;">
|
| 1000 |
<th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
|
| 1001 |
<th scope="col" style="text-align: center;">AWS<br>Polly</th>
|
| 1002 |
</tr>
|
|
|
|
| 57 |
control
|
| 58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
| 59 |
voice
|
| 60 |
+
cloning (PVC) by fine-tuning timbre features with additional data. Welcome to visit
|
| 61 |
+
<a href="https://www.minimax.io/audio">MiniMax Audio</a> and
|
| 62 |
+
explore our powerful TTS features.
|
|
|
|
| 63 |
</p>
|
| 64 |
</div>
|
| 65 |
|
|
|
|
| 232 |
features based
|
| 233 |
on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
|
| 234 |
rate,
|
| 235 |
+
emotions, etc.) demonstrated in the audio prompt (The additional input that OneShot has compared to ZeroShot,
|
| 236 |
+
see
|
| 237 |
+
technical report for details).
|
| 238 |
</p>
|
| 239 |
<div class="scroll-wrapper" style="margin-top: 2rem;">
|
| 240 |
<table style="width: 100%;">
|
| 241 |
<tbody>
|
| 242 |
<tr class="border-bottom-thin">
|
| 243 |
<th scope="col">Source Audio</th>
|
|
|
|
| 244 |
<th scope="col">Text</th>
|
| 245 |
<th scope="col">Zero-Shot Version</th>
|
| 246 |
<th scope="col">One-Shot Version</th>
|
| 247 |
<th scope="col">Elevenlabs Multilingual_v2</th>
|
| 248 |
</tr>
|
| 249 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
|
|
|
| 250 |
<td>
|
| 251 |
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
|
| 252 |
</td>
|
|
|
|
| 277 |
</td>
|
| 278 |
</tr>
|
| 279 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
|
|
|
| 280 |
<td>
|
| 281 |
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
|
| 282 |
</td>
|
|
|
|
| 311 |
</td>
|
| 312 |
</tr>
|
| 313 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
|
|
|
| 314 |
<td>
|
| 315 |
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
|
| 316 |
</td>
|
|
|
|
| 337 |
</td>
|
| 338 |
</tr>
|
| 339 |
<tr>
|
|
|
|
|
|
|
|
|
|
| 340 |
<td>
|
| 341 |
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
|
| 342 |
</td>
|
|
|
|
| 386 |
<th scope="col">Languages</th>
|
| 387 |
<th scope="col">Source Audio</th>
|
| 388 |
<th scope="col">Text</th>
|
| 389 |
+
<th scope="col">MiniMax<br>Speech_02_HD</th>
|
| 390 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
| 391 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
| 392 |
</tr>
|
|
|
|
| 507 |
<tbody>
|
| 508 |
<tr class="border-bottom-thin">
|
| 509 |
<th scope="col">Original Language</th>
|
|
|
|
| 510 |
<th scope="col">Source Audio</th>
|
| 511 |
+
<th scope="col">Mixed Language</th>
|
| 512 |
<th scope="col">Text</th>
|
| 513 |
+
<th scope="col">MiniMax<br>Speech_02_HD</th>
|
| 514 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
| 515 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
| 516 |
</tr>
|
| 517 |
<tr class="border-bottom-thin">
|
| 518 |
<td>English</td>
|
|
|
|
| 519 |
<td>
|
| 520 |
<audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
|
| 521 |
</td>
|
| 522 |
+
<td>English + Mandarin</td>
|
| 523 |
<td>
|
| 524 |
Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
|
| 525 |
I see you're using AI tools already - so smart!<br>
|
|
|
|
| 539 |
</tr>
|
| 540 |
<tr class="border-bottom-thin">
|
| 541 |
<td>Mandarin</td>
|
|
|
|
| 542 |
<td>
|
| 543 |
<audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
|
| 544 |
</td>
|
| 545 |
+
<td>Mandarin + Cantonese</td>
|
| 546 |
<td>
|
| 547 |
老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
|
| 548 |
我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
|
|
|
|
| 560 |
</tr>
|
| 561 |
<tr class="border-bottom-thin">
|
| 562 |
<td>Mandarin</td>
|
|
|
|
| 563 |
<td>
|
| 564 |
<audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
|
| 565 |
</td>
|
| 566 |
+
<td>Mandarin + English</td>
|
| 567 |
<td>
|
| 568 |
The people said, 桂林's scenery is the first under heaven.<br>
|
| 569 |
Yet in my opinion, 阳朔 scenery is better than ��林。<br>
|
|
|
|
| 581 |
</tr>
|
| 582 |
<tr class="border-bottom-thin">
|
| 583 |
<td>English</td>
|
|
|
|
| 584 |
<td>
|
| 585 |
<audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
|
| 586 |
</td>
|
| 587 |
+
<td>English + Spanish</td>
|
| 588 |
<td>
|
| 589 |
Mi abuelita always told me "el que persevera, alcanza".<br>
|
| 590 |
If you persevere, you'll achieve your dreams!<br>
|
|
|
|
| 602 |
</tr>
|
| 603 |
<tr class="border-bottom-thin">
|
| 604 |
<td>Japanese</td>
|
|
|
|
| 605 |
<td>
|
| 606 |
<audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
|
| 607 |
</td>
|
| 608 |
+
<td>Japanese + Korean</td>
|
| 609 |
<td>
|
| 610 |
最近の天気予報によりますと、今週末は桜の開花に最適<br>
|
| 611 |
な気温になる予定です。<br>
|
|
|
|
| 984 |
<tbody>
|
| 985 |
<tr class="border-bottom-thin">
|
| 986 |
<th scope="col">Text</th>
|
| 987 |
+
<th scope="col" style="text-align: center;">MiniMax<br>Speech_02_HD</th>
|
| 988 |
<th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
|
| 989 |
<th scope="col" style="text-align: center;">AWS<br>Polly</th>
|
| 990 |
</tr>
|