tts-api / README.md
Avinyaa
new
9a88d9c
|
raw
history blame
2.32 kB
metadata
title: Tts Api
emoji: 🚀
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

TTS API

A FastAPI-based Text-to-Speech API using XTTS-v2 for voice cloning.

Features

  • Convert text to speech using voice cloning
  • Upload reference speaker audio files
  • Support for multiple languages
  • RESTful API with automatic documentation
  • Docker support

Setup

Local Development

  1. Install dependencies:
pip install -r requirements.txt
  1. Run the API:
python app.py

The API will be available at http://localhost:8000

Using Docker

  1. Build the Docker image:
docker build -t tts-api .
  1. Run the container:
docker run -p 8000:8000 tts-api

API Endpoints

Health Check

  • GET /health - Check API status

Text-to-Speech

  • POST /tts - Convert text to speech with uploaded speaker file
    • Parameters:
      • text (form): Text to convert to speech
      • language (form): Language code (default: "en")
      • speaker_file (file): Reference speaker audio file

API Documentation

  • GET /docs - Interactive API documentation (Swagger UI)
  • GET /redoc - Alternative API documentation

Usage Examples

Using Python requests

import requests

# Prepare the request
url = "http://localhost:8000/tts"
data = {
    "text": "Hello, this is a test of voice cloning!",
    "language": "en"
}
files = {
    "speaker_file": open("path/to/speaker.wav", "rb")
}

# Make the request
response = requests.post(url, data=data, files=files)

# Save the generated audio
if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using curl

curl -X POST "http://localhost:8000/tts" \
  -F "text=Hello, this is a test!" \
  -F "language=en" \
  -F "speaker_file=@path/to/speaker.wav" \
  --output generated_speech.wav

Using the provided client example

python client_example.py

Requirements

  • Python 3.8+
  • CUDA-compatible GPU (recommended for faster processing)
  • Audio file in supported format (WAV, MP3, etc.) for speaker reference

Model

This API uses the XTTS-v2_C3PO model for voice cloning, which is automatically downloaded when building the Docker image.