Spaces:
Running
Running
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "b6ee1ede", | |
"metadata": {}, | |
"source": [ | |
"## Cross-Lingual Voice Clone Demo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "b7f043ee", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os\n", | |
"import torch\n", | |
"import se_extractor\n", | |
"from api import ToneColorConverter" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "15116b59", | |
"metadata": {}, | |
"source": [ | |
"### Initialization" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "aacad912", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ckpt_converter = 'checkpoints/converter'\n", | |
"device = 'cuda:0'\n", | |
"output_dir = 'outputs'\n", | |
"\n", | |
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n", | |
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n", | |
"\n", | |
"os.makedirs(output_dir, exist_ok=True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3db80fcf", | |
"metadata": {}, | |
"source": [ | |
"In this demo, we will use OpenAI TTS as the base speaker to produce multi-lingual speech audio. The users can flexibly change the base speaker according to their own needs. Please create a file named `.env` and place OpenAI key as `OPENAI_API_KEY=xxx`. We have also provided a Chinese base speaker model (see `demo_part1.ipynb`)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "3b245ca3", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from openai import OpenAI\n", | |
"from dotenv import load_dotenv\n", | |
"\n", | |
"# Please create a file named .env and place your\n", | |
"# OpenAI key as OPENAI_API_KEY=xxx\n", | |
"load_dotenv() \n", | |
"\n", | |
"client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\"))\n", | |
"\n", | |
"response = client.audio.speech.create(\n", | |
" model=\"tts-1\",\n", | |
" voice=\"nova\",\n", | |
" input=\"This audio will be used to extract the base speaker tone color embedding. \" + \\\n", | |
" \"Typically a very short audio should be sufficient, but increasing the audio \" + \\\n", | |
" \"length will also improve the output audio quality.\"\n", | |
")\n", | |
"\n", | |
"response.stream_to_file(f\"{output_dir}/openai_source_output.mp3\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7f67740c", | |
"metadata": {}, | |
"source": [ | |
"### Obtain Tone Color Embedding" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f8add279", | |
"metadata": {}, | |
"source": [ | |
"The `source_se` is the tone color embedding of the base speaker. \n", | |
"It is an average for multiple sentences with multiple emotions\n", | |
"of the base speaker. We directly provide the result here but\n", | |
"the readers feel free to extract `source_se` by themselves." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "63ff6273", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"base_speaker = f\"{output_dir}/openai_source_output.mp3\"\n", | |
"source_se, audio_name = se_extractor.get_se(base_speaker, tone_color_converter, vad=True)\n", | |
"\n", | |
"reference_speaker = 'resources/example_reference.mp3'\n", | |
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a40284aa", | |
"metadata": {}, | |
"source": [ | |
"### Inference" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "73dc1259", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Run the base speaker tts\n", | |
"text = [\n", | |
" \"MyShell is a decentralized and comprehensive platform for discovering, creating, and staking AI-native apps.\",\n", | |
" \"MyShell es una plataforma descentralizada y completa para descubrir, crear y apostar por aplicaciones nativas de IA.\",\n", | |
" \"MyShell est une plateforme décentralisée et complète pour découvrir, créer et miser sur des applications natives d'IA.\",\n", | |
" \"MyShell ist eine dezentralisierte und umfassende Plattform zum Entdecken, Erstellen und Staken von KI-nativen Apps.\",\n", | |
" \"MyShell è una piattaforma decentralizzata e completa per scoprire, creare e scommettere su app native di intelligenza artificiale.\",\n", | |
" \"MyShellは、AIネイティブアプリの発見、作成、およびステーキングのための分散型かつ包括的なプラットフォームです。\",\n", | |
" \"MyShell — это децентрализованная и всеобъемлющая платформа для обнаружения, создания и стейкинга AI-ориентированных приложений.\",\n", | |
" \"MyShell هي منصة لامركزية وشاملة لاكتشاف وإنشاء ورهان تطبيقات الذكاء الاصطناعي الأصلية.\",\n", | |
" \"MyShell是一个去中心化且全面的平台,用于发现、创建和投资AI原生应用程序。\",\n", | |
" \"MyShell एक विकेंद्रीकृत और व्यापक मंच है, जो AI-मूल ऐप्स की खोज, सृजन और स्टेकिंग के लिए है।\",\n", | |
" \"MyShell é uma plataforma descentralizada e abrangente para descobrir, criar e apostar em aplicativos nativos de IA.\"\n", | |
"]\n", | |
"src_path = f'{output_dir}/tmp.wav'\n", | |
"\n", | |
"for i, t in enumerate(text):\n", | |
"\n", | |
" response = client.audio.speech.create(\n", | |
" model=\"tts-1\",\n", | |
" voice=\"alloy\",\n", | |
" input=t,\n", | |
" )\n", | |
"\n", | |
" response.stream_to_file(src_path)\n", | |
"\n", | |
" save_path = f'{output_dir}/output_crosslingual_{i}.wav'\n", | |
"\n", | |
" # Run the tone color converter\n", | |
" encode_message = \"@MyShell\"\n", | |
" tone_color_converter.convert(\n", | |
" audio_src_path=src_path, \n", | |
" src_se=source_se, \n", | |
" tgt_se=target_se, \n", | |
" output_path=save_path,\n", | |
" message=encode_message)" | |
] | |
} | |
], | |
"metadata": { | |
"interpreter": { | |
"hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35" | |
}, | |
"kernelspec": { | |
"display_name": "Python 3.9.18 ('openvoice')", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.18" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |