tts-api / README.md
Avinyaa
new
9acb9c3
|
raw
history blame
4.64 kB
metadata
title: Kokoro TTS API
emoji: 🎤
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

Kokoro TTS API

A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.

Features

  • Convert text to speech using Kokoro TTS
  • Multiple voice options (af_heart, af_sky, af_bella, etc.)
  • Automatic language detection
  • RESTful API with automatic documentation
  • Docker support
  • Lightweight and fast processing
  • Apache-licensed weights

About Kokoro

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Setup

Local Development

  1. Install system dependencies:
# On Ubuntu/Debian
sudo apt-get install espeak-ng

# On macOS
brew install espeak
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Run the API:
uvicorn app:app --host 0.0.0.0 --port 7860

The API will be available at http://localhost:7860

Using Docker

  1. Build the Docker image:
docker build -t kokoro-tts-api .
  1. Run the container:
docker run -p 7860:7860 kokoro-tts-api

API Endpoints

Health Check

  • GET /health - Check API status and device information

Available Voices

  • GET /voices - Get list of available voices

Text-to-Speech (Form Data)

  • POST /tts - Convert text to speech using form data
    • Parameters:
      • text (form): Text to convert to speech
      • voice (form): Voice to use (default: "af_heart")
      • lang_code (form): Language code (default: "a" for auto-detect)

Text-to-Speech (JSON)

  • POST /tts-json - Convert text to speech using JSON request body
    • Body: JSON object with text, voice, and lang_code fields

API Documentation

  • GET /docs - Interactive API documentation (Swagger UI)
  • GET /redoc - Alternative API documentation

Available Voices

  • af_heart - Female voice (Heart)
  • af_sky - Female voice (Sky)
  • af_bella - Female voice (Bella)
  • af_sarah - Female voice (Sarah)
  • af_nicole - Female voice (Nicole)
  • am_adam - Male voice (Adam)
  • am_michael - Male voice (Michael)
  • am_edward - Male voice (Edward)
  • am_lewis - Male voice (Lewis)

Usage Examples

Using Python requests (Form Data)

import requests

# Prepare the request
url = "http://localhost:7860/tts"
data = {
    "text": "Hello, this is Kokoro TTS in action!",
    "voice": "af_heart",
    "lang_code": "a"
}

# Make the request
response = requests.post(url, data=data)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using Python requests (JSON)

import requests

# Prepare the JSON request
url = "http://localhost:7860/tts-json"
data = {
    "text": "Kokoro delivers high-quality speech synthesis!",
    "voice": "af_bella",
    "lang_code": "a"
}

headers = {"Content-Type": "application/json"}

# Make the request
response = requests.post(url, json=data, headers=headers)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_json_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using curl (Form Data)

curl -X POST "http://localhost:7860/tts" \
  -F "text=Hello from Kokoro TTS!" \
  -F "voice=af_heart" \
  -F "lang_code=a" \
  --output kokoro_speech.wav

Using curl (JSON)

curl -X POST "http://localhost:7860/tts-json" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
  --output kokoro_speech.wav

Get Available Voices

curl http://localhost:7860/voices

Using the provided client example

python client_example.py

Requirements

  • Python 3.11+
  • espeak-ng system package
  • CUDA-compatible GPU (optional, for faster processing)

Model Information

This API uses Kokoro TTS, which:

  • Has 82 million parameters
  • Supports multiple voices and languages
  • Provides fast, high-quality speech synthesis
  • Uses Apache-licensed weights
  • Requires minimal system resources compared to larger models

Testing

Run the standalone test:

python test.py

This will generate audio files demonstrating Kokoro's capabilities.