tts-api / README.md
Avinyaa
new
a7aae29
|
raw
history blame
7.65 kB
metadata
title: XTTS C3PO Voice Cloning API
emoji: 🤖
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

XTTS C3PO Voice Cloning API

A FastAPI-based Text-to-Speech API using XTTS-v2 with the iconic C3PO voice from Star Wars.

Features

  • C3PO Voice: Pre-loaded with the iconic C3PO voice from Star Wars
  • Custom Voice Cloning: Upload your own reference audio for voice cloning
  • Multilingual Support: 16+ languages with C3PO voice
  • No Upload Required: Use C3PO voice without any file uploads
  • RESTful API: Clean API with automatic documentation
  • Docker Support: Optimized for Hugging Face Spaces deployment
  • PyTorch 2.6 Compatible: Includes compatibility fixes

About the C3PO Model

This API uses the XTTS-v2 C3PO model from Borcherding/XTTS-v2_C3PO, which provides the iconic voice of C-3PO from Star Wars. The model supports:

  • High-quality C3PO voice synthesis
  • Multilingual C3PO speech (16+ languages)
  • Custom voice cloning capabilities
  • Real-time speech generation

Quick Start

Using C3PO Voice (No Upload Required)

curl -X POST "http://localhost:7860/tts-c3po" \
  -F "text=Hello there! I am C-3PO, human-cyborg relations." \
  -F "language=en" \
  --output c3po_speech.wav

Using Custom Voice Cloning

curl -X POST "http://localhost:7860/tts" \
  -F "text=This will be spoken in your custom voice!" \
  -F "language=en" \
  -F "speaker_file=@your_reference_voice.wav" \
  --output custom_speech.wav

API Endpoints

C3PO Voice Only

  • POST /tts-c3po - Generate speech using C3PO voice (no file upload needed)
    • Parameters:
      • text (form): Text to convert to speech (max 500 characters)
      • language (form): Language code (default: "en")
      • no_lang_auto_detect (form): Disable automatic language detection

Voice Cloning with Fallback

  • POST /tts - Convert text to speech with optional custom voice
    • Parameters:
      • text (form): Text to convert to speech (max 500 characters)
      • language (form): Language code (default: "en")
      • voice_cleanup (form): Apply audio cleanup to reference voice
      • no_lang_auto_detect (form): Disable automatic language detection
      • speaker_file (file, optional): Reference speaker audio file (uses C3PO if not provided)

JSON API

  • POST /tts-json - Convert text to speech using JSON request body
    • Body: JSON object with text, language, voice_cleanup, no_lang_auto_detect
    • File: speaker_file (optional) - Reference speaker audio file

Information Endpoints

  • GET /health - Check API status, device info, and supported languages
  • GET /languages - Get list of supported languages
  • GET /docs - Interactive API documentation (Swagger UI)

Usage Examples

Python - C3PO Voice

import requests

# Generate C3PO speech
url = "http://localhost:7860/tts-c3po"
data = {
    "text": "Hello there! I am C-3PO, human-cyborg relations.",
    "language": "en"
}

response = requests.post(url, data=data)

if response.status_code == 200:
    with open("c3po_speech.wav", "wb") as f:
        f.write(response.content)
    print("C3PO speech generated!")

Python - Custom Voice with C3PO Fallback

import requests

url = "http://localhost:7860/tts"
data = {
    "text": "This will use C3PO voice if no speaker file is provided.",
    "language": "en"
}

# No speaker_file provided - will use C3PO voice
response = requests.post(url, data=data)

if response.status_code == 200:
    with open("speech_output.wav", "wb") as f:
        f.write(response.content)

Multilingual C3PO

# C3PO speaking Spanish
data = {
    "text": "Hola, soy C-3PO. Domino más de seis millones de formas de comunicación.",
    "language": "es"
}
response = requests.post("http://localhost:7860/tts-c3po", data=data)

Supported Languages

The C3PO model supports all XTTS-v2 languages:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese (Brazilian)
  • pl - Polish
  • tr - Turkish
  • ru - Russian
  • nl - Dutch
  • cs - Czech
  • ar - Arabic
  • zh-cn - Mandarin Chinese
  • ja - Japanese
  • ko - Korean
  • hu - Hungarian
  • hi - Hindi

Setup

Hugging Face Spaces Deployment

This API is optimized for Hugging Face Spaces with:

  • Automatic C3PO model downloading
  • Proper user permissions (user ID 1000)
  • PyTorch 2.6 compatibility fixes
  • COQUI license agreement handling

Local Development

  1. Install system dependencies:
# Ubuntu/Debian
sudo apt-get install espeak-ng ffmpeg git git-lfs

# macOS
brew install espeak ffmpeg git git-lfs
  1. Install Python dependencies:
pip install -r requirements.txt
python -m unidic download
  1. Clone C3PO model (optional - auto-downloaded on first run):
git clone https://huggingface.co/Borcherding/XTTS-v2_C3PO XTTS-v2_C3PO
  1. Run the API:
uvicorn app:app --host 0.0.0.0 --port 7860

Using Docker

# Build and run
docker build -t xtts-c3po-api .
docker run -p 7860:7860 xtts-c3po-api

Reference Audio Guidelines

For custom voice cloning:

  1. Duration: 3-10 seconds of clear speech
  2. Quality: High-quality audio, minimal background noise
  3. Format: WAV format recommended (MP3, M4A also supported)
  4. Content: Natural speech, avoid music or effects
  5. Speaker: Single speaker, clear pronunciation

Model Information

  • Base Model: XTTS-v2
  • Voice: C3PO from Star Wars
  • Source: Borcherding/XTTS-v2_C3PO
  • Languages: 16+ supported
  • License: CPML (Coqui Public Model License)

Testing

Run the test suite:

# Test C3PO model functionality
python test.py

# Test API endpoints
python client_example.py

Environment Variables

Automatically configured:

  • COQUI_TOS_AGREED=1 - Agrees to CPML license
  • NUMBA_DISABLE_JIT=1 - Disables Numba JIT compilation

API Response Examples

Health Check Response

{
  "status": "healthy",
  "device": "cuda",
  "model": "XTTS-v2 C3PO",
  "default_voice": "C3PO",
  "supported_languages": ["en", "es", "fr", ...]
}

Languages Response

{
  "languages": ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "ko", "hu", "hi"]
}

Troubleshooting

PyTorch Loading Issues

The API includes fixes for PyTorch 2.6's weights_only=True default. If you encounter loading issues, ensure the compatibility fix is applied.

Model Download Issues

If the C3PO model fails to download:

  1. Check internet connection
  2. Verify git and git-lfs are installed
  3. Manually clone: git clone https://huggingface.co/Borcherding/XTTS-v2_C3PO XTTS-v2_C3PO

Audio Quality Issues

  • Use high-quality reference audio for custom voices
  • Enable voice_cleanup for noisy reference audio
  • Ensure reference audio is 3-10 seconds long

Memory Issues

  • Use CPU mode for lower memory usage: set CUDA_VISIBLE_DEVICES=""
  • Reduce text length for batch processing
  • Consider using GPU with sufficient VRAM (4GB+ recommended)

License

This project uses XTTS-v2 which is licensed under the Coqui Public Model License (CPML). The C3PO model is provided by the community. See https://coqui.ai/cpml for license details.

Credits

  • XTTS-v2: Coqui AI
  • C3PO Model: Borcherding
  • Original Character: C-3PO from Star Wars (Lucasfilm/Disney)