Spaces:

diabolic6045
/

tts-api

Sleeping

App Files Files Community

tts-api / README.md

Avinyaa

new

a7aae29 6 months ago

preview code

raw

history blame

7.65 kB

metadata

title: XTTS C3PO Voice Cloning API
emoji: 🤖
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

XTTS C3PO Voice Cloning API

A FastAPI-based Text-to-Speech API using XTTS-v2 with the iconic C3PO voice from Star Wars.

Features

C3PO Voice: Pre-loaded with the iconic C3PO voice from Star Wars
Custom Voice Cloning: Upload your own reference audio for voice cloning
Multilingual Support: 16+ languages with C3PO voice
No Upload Required: Use C3PO voice without any file uploads
RESTful API: Clean API with automatic documentation
Docker Support: Optimized for Hugging Face Spaces deployment
PyTorch 2.6 Compatible: Includes compatibility fixes

About the C3PO Model

This API uses the XTTS-v2 C3PO model from Borcherding/XTTS-v2_C3PO, which provides the iconic voice of C-3PO from Star Wars. The model supports:

High-quality C3PO voice synthesis
Multilingual C3PO speech (16+ languages)
Custom voice cloning capabilities
Real-time speech generation

Quick Start

Using C3PO Voice (No Upload Required)

curl -X POST "http://localhost:7860/tts-c3po" \
  -F "text=Hello there! I am C-3PO, human-cyborg relations." \
  -F "language=en" \
  --output c3po_speech.wav

Using Custom Voice Cloning

curl -X POST "http://localhost:7860/tts" \
  -F "text=This will be spoken in your custom voice!" \
  -F "language=en" \
  -F "speaker_file=@your_reference_voice.wav" \
  --output custom_speech.wav

API Endpoints

C3PO Voice Only

POST /tts-c3po - Generate speech using C3PO voice (no file upload needed)
- Parameters:
  - text (form): Text to convert to speech (max 500 characters)
  - language (form): Language code (default: "en")
  - no_lang_auto_detect (form): Disable automatic language detection

Voice Cloning with Fallback

POST /tts - Convert text to speech with optional custom voice
- Parameters:
  - text (form): Text to convert to speech (max 500 characters)
  - language (form): Language code (default: "en")
  - voice_cleanup (form): Apply audio cleanup to reference voice
  - no_lang_auto_detect (form): Disable automatic language detection
  - speaker_file (file, optional): Reference speaker audio file (uses C3PO if not provided)

JSON API

POST /tts-json - Convert text to speech using JSON request body
- Body: JSON object with text, language, voice_cleanup, no_lang_auto_detect
- File: speaker_file (optional) - Reference speaker audio file

Information Endpoints

GET /health - Check API status, device info, and supported languages
GET /languages - Get list of supported languages
GET /docs - Interactive API documentation (Swagger UI)

Usage Examples

Python - C3PO Voice

import requests

# Generate C3PO speech
url = "http://localhost:7860/tts-c3po"
data = {
    "text": "Hello there! I am C-3PO, human-cyborg relations.",
    "language": "en"
}

response = requests.post(url, data=data)

if response.status_code == 200:
    with open("c3po_speech.wav", "wb") as f:
        f.write(response.content)
    print("C3PO speech generated!")

Python - Custom Voice with C3PO Fallback

import requests

url = "http://localhost:7860/tts"
data = {
    "text": "This will use C3PO voice if no speaker file is provided.",
    "language": "en"
}

# No speaker_file provided - will use C3PO voice
response = requests.post(url, data=data)

if response.status_code == 200:
    with open("speech_output.wav", "wb") as f:
        f.write(response.content)

Multilingual C3PO

# C3PO speaking Spanish
data = {
    "text": "Hola, soy C-3PO. Domino más de seis millones de formas de comunicación.",
    "language": "es"
}
response = requests.post("http://localhost:7860/tts-c3po", data=data)

Supported Languages

The C3PO model supports all XTTS-v2 languages:

en - English
es - Spanish
fr - French
de - German
it - Italian
pt - Portuguese (Brazilian)
pl - Polish
tr - Turkish
ru - Russian
nl - Dutch
cs - Czech
ar - Arabic
zh-cn - Mandarin Chinese
ja - Japanese
ko - Korean
hu - Hungarian
hi - Hindi

Setup

Hugging Face Spaces Deployment

This API is optimized for Hugging Face Spaces with:

Automatic C3PO model downloading
Proper user permissions (user ID 1000)
PyTorch 2.6 compatibility fixes
COQUI license agreement handling

Local Development

Install system dependencies:

# Ubuntu/Debian
sudo apt-get install espeak-ng ffmpeg git git-lfs

# macOS
brew install espeak ffmpeg git git-lfs

Install Python dependencies:

pip install -r requirements.txt
python -m unidic download

Clone C3PO model (optional - auto-downloaded on first run):

git clone https://huggingface.co/Borcherding/XTTS-v2_C3PO XTTS-v2_C3PO

Run the API:

uvicorn app:app --host 0.0.0.0 --port 7860

Using Docker

# Build and run
docker build -t xtts-c3po-api .
docker run -p 7860:7860 xtts-c3po-api

Reference Audio Guidelines

For custom voice cloning:

Duration: 3-10 seconds of clear speech
Quality: High-quality audio, minimal background noise
Format: WAV format recommended (MP3, M4A also supported)
Content: Natural speech, avoid music or effects
Speaker: Single speaker, clear pronunciation

Model Information

Base Model: XTTS-v2
Voice: C3PO from Star Wars
Source: Borcherding/XTTS-v2_C3PO
Languages: 16+ supported
License: CPML (Coqui Public Model License)

Testing

Run the test suite:

# Test C3PO model functionality
python test.py

# Test API endpoints
python client_example.py

Environment Variables

Automatically configured:

COQUI_TOS_AGREED=1 - Agrees to CPML license
NUMBA_DISABLE_JIT=1 - Disables Numba JIT compilation

API Response Examples

Health Check Response

{
  "status": "healthy",
  "device": "cuda",
  "model": "XTTS-v2 C3PO",
  "default_voice": "C3PO",
  "supported_languages": ["en", "es", "fr", ...]
}

Languages Response

{
  "languages": ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "ko", "hu", "hi"]
}

Troubleshooting

PyTorch Loading Issues

The API includes fixes for PyTorch 2.6's weights_only=True default. If you encounter loading issues, ensure the compatibility fix is applied.

Model Download Issues

If the C3PO model fails to download:

Check internet connection
Verify git and git-lfs are installed
Manually clone: git clone https://huggingface.co/Borcherding/XTTS-v2_C3PO XTTS-v2_C3PO

Audio Quality Issues

Use high-quality reference audio for custom voices
Enable voice_cleanup for noisy reference audio
Ensure reference audio is 3-10 seconds long

Memory Issues

Use CPU mode for lower memory usage: set CUDA_VISIBLE_DEVICES=""
Reduce text length for batch processing
Consider using GPU with sufficient VRAM (4GB+ recommended)

License

This project uses XTTS-v2 which is licensed under the Coqui Public Model License (CPML). The C3PO model is provided by the community. See https://coqui.ai/cpml for license details.

Credits

XTTS-v2: Coqui AI
C3PO Model: Borcherding
Original Character: C-3PO from Star Wars (Lucasfilm/Disney)