Spaces:
Running
Running
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "b6ee1ede", | |
| "metadata": {}, | |
| "source": [ | |
| "## Cross-Lingual Voice Clone Demo" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "b7f043ee", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import os\n", | |
| "import torch\n", | |
| "import se_extractor\n", | |
| "from api import ToneColorConverter" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "15116b59", | |
| "metadata": {}, | |
| "source": [ | |
| "### Initialization" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "aacad912", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "ckpt_converter = 'checkpoints/converter'\n", | |
| "device = 'cuda:0'\n", | |
| "output_dir = 'outputs'\n", | |
| "\n", | |
| "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n", | |
| "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n", | |
| "\n", | |
| "os.makedirs(output_dir, exist_ok=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "3db80fcf", | |
| "metadata": {}, | |
| "source": [ | |
| "In this demo, we will use OpenAI TTS as the base speaker to produce multi-lingual speech audio. The users can flexibly change the base speaker according to their own needs. Please create a file named `.env` and place OpenAI key as `OPENAI_API_KEY=xxx`. We have also provided a Chinese base speaker model (see `demo_part1.ipynb`)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "3b245ca3", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from openai import OpenAI\n", | |
| "from dotenv import load_dotenv\n", | |
| "\n", | |
| "# Please create a file named .env and place your\n", | |
| "# OpenAI key as OPENAI_API_KEY=xxx\n", | |
| "load_dotenv() \n", | |
| "\n", | |
| "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\"))\n", | |
| "\n", | |
| "response = client.audio.speech.create(\n", | |
| " model=\"tts-1\",\n", | |
| " voice=\"nova\",\n", | |
| " input=\"This audio will be used to extract the base speaker tone color embedding. \" + \\\n", | |
| " \"Typically a very short audio should be sufficient, but increasing the audio \" + \\\n", | |
| " \"length will also improve the output audio quality.\"\n", | |
| ")\n", | |
| "\n", | |
| "response.stream_to_file(f\"{output_dir}/openai_source_output.mp3\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "7f67740c", | |
| "metadata": {}, | |
| "source": [ | |
| "### Obtain Tone Color Embedding" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "f8add279", | |
| "metadata": {}, | |
| "source": [ | |
| "The `source_se` is the tone color embedding of the base speaker. \n", | |
| "It is an average for multiple sentences with multiple emotions\n", | |
| "of the base speaker. We directly provide the result here but\n", | |
| "the readers feel free to extract `source_se` by themselves." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "63ff6273", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "base_speaker = f\"{output_dir}/openai_source_output.mp3\"\n", | |
| "source_se, audio_name = se_extractor.get_se(base_speaker, tone_color_converter, vad=True)\n", | |
| "\n", | |
| "reference_speaker = 'resources/example_reference.mp3'\n", | |
| "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "a40284aa", | |
| "metadata": {}, | |
| "source": [ | |
| "### Inference" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "73dc1259", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Run the base speaker tts\n", | |
| "text = [\n", | |
| " \"MyShell is a decentralized and comprehensive platform for discovering, creating, and staking AI-native apps.\",\n", | |
| " \"MyShell es una plataforma descentralizada y completa para descubrir, crear y apostar por aplicaciones nativas de IA.\",\n", | |
| " \"MyShell est une plateforme décentralisée et complète pour découvrir, créer et miser sur des applications natives d'IA.\",\n", | |
| " \"MyShell ist eine dezentralisierte und umfassende Plattform zum Entdecken, Erstellen und Staken von KI-nativen Apps.\",\n", | |
| " \"MyShell è una piattaforma decentralizzata e completa per scoprire, creare e scommettere su app native di intelligenza artificiale.\",\n", | |
| " \"MyShellは、AIネイティブアプリの発見、作成、およびステーキングのための分散型かつ包括的なプラットフォームです。\",\n", | |
| " \"MyShell — это децентрализованная и всеобъемлющая платформа для обнаружения, создания и стейкинга AI-ориентированных приложений.\",\n", | |
| " \"MyShell هي منصة لامركزية وشاملة لاكتشاف وإنشاء ورهان تطبيقات الذكاء الاصطناعي الأصلية.\",\n", | |
| " \"MyShell是一个去中心化且全面的平台,用于发现、创建和投资AI原生应用程序。\",\n", | |
| " \"MyShell एक विकेंद्रीकृत और व्यापक मंच है, जो AI-मूल ऐप्स की खोज, सृजन और स्टेकिंग के लिए है।\",\n", | |
| " \"MyShell é uma plataforma descentralizada e abrangente para descobrir, criar e apostar em aplicativos nativos de IA.\"\n", | |
| "]\n", | |
| "src_path = f'{output_dir}/tmp.wav'\n", | |
| "\n", | |
| "for i, t in enumerate(text):\n", | |
| "\n", | |
| " response = client.audio.speech.create(\n", | |
| " model=\"tts-1\",\n", | |
| " voice=\"alloy\",\n", | |
| " input=t,\n", | |
| " )\n", | |
| "\n", | |
| " response.stream_to_file(src_path)\n", | |
| "\n", | |
| " save_path = f'{output_dir}/output_crosslingual_{i}.wav'\n", | |
| "\n", | |
| " # Run the tone color converter\n", | |
| " encode_message = \"@MyShell\"\n", | |
| " tone_color_converter.convert(\n", | |
| " audio_src_path=src_path, \n", | |
| " src_se=source_se, \n", | |
| " tgt_se=target_se, \n", | |
| " output_path=save_path,\n", | |
| " message=encode_message)" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "interpreter": { | |
| "hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35" | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3.9.18 ('openvoice')", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.9.18" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |