tts-api / README.md
Avinyaa
u
5efbc82
|
raw
history blame
5.91 kB
metadata
title: Kokoro TTS API
emoji: 🎤
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

Kokoro TTS API

A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.

Features

  • Convert text to speech using Kokoro TTS
  • Multiple voice options (af_heart, af_sky, af_bella, etc.)
  • Automatic language detection
  • RESTful API with automatic documentation
  • Docker support
  • Lightweight and fast processing
  • Apache-licensed weights
  • Optimized for Hugging Face Spaces deployment

About Kokoro

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Setup

Hugging Face Spaces Deployment

This API is optimized for Hugging Face Spaces deployment. The Docker configuration automatically handles:

  • Cache directory setup with proper permissions
  • Environment variable configuration
  • Model downloading and caching

Simply deploy to Hugging Face Spaces using the Docker SDK.

Troubleshooting on HF Spaces

If you encounter permission errors, you can use the diagnostic startup script:

  1. Change the Dockerfile CMD to: CMD ["python", "startup.py"]
  2. This will run diagnostics and show detailed information about the environment

Local Development

  1. Install system dependencies:
# On Ubuntu/Debian
sudo apt-get install espeak-ng

# On macOS
brew install espeak
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Run the API:
uvicorn app:app --host 0.0.0.0 --port 7860

The API will be available at http://localhost:7860

Using Docker

  1. Build the Docker image:
docker build -t kokoro-tts-api .
  1. Run the container:
docker run -p 7860:7860 kokoro-tts-api

API Endpoints

Health Check

  • GET /health - Check API status and device information

Available Voices

  • GET /voices - Get list of available voices

Text-to-Speech (Form Data)

  • POST /tts - Convert text to speech using form data
    • Parameters:
      • text (form): Text to convert to speech
      • voice (form): Voice to use (default: "af_heart")
      • lang_code (form): Language code (default: "a" for auto-detect)

Text-to-Speech (JSON)

  • POST /tts-json - Convert text to speech using JSON request body
    • Body: JSON object with text, voice, and lang_code fields

API Documentation

  • GET /docs - Interactive API documentation (Swagger UI)
  • GET /redoc - Alternative API documentation

Available Voices

  • af_heart - Female voice (Heart)
  • af_sky - Female voice (Sky)
  • af_bella - Female voice (Bella)
  • af_sarah - Female voice (Sarah)
  • af_nicole - Female voice (Nicole)
  • am_adam - Male voice (Adam)
  • am_michael - Male voice (Michael)
  • am_edward - Male voice (Edward)
  • am_lewis - Male voice (Lewis)

Usage Examples

Using Python requests (Form Data)

import requests

# Prepare the request
url = "http://localhost:7860/tts"
data = {
    "text": "Hello, this is Kokoro TTS in action!",
    "voice": "af_heart",
    "lang_code": "a"
}

# Make the request
response = requests.post(url, data=data)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using Python requests (JSON)

import requests

# Prepare the JSON request
url = "http://localhost:7860/tts-json"
data = {
    "text": "Kokoro delivers high-quality speech synthesis!",
    "voice": "af_bella",
    "lang_code": "a"
}

headers = {"Content-Type": "application/json"}

# Make the request
response = requests.post(url, json=data, headers=headers)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_json_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using curl (Form Data)

curl -X POST "http://localhost:7860/tts" \
  -F "text=Hello from Kokoro TTS!" \
  -F "voice=af_heart" \
  -F "lang_code=a" \
  --output kokoro_speech.wav

Using curl (JSON)

curl -X POST "http://localhost:7860/tts-json" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
  --output kokoro_speech.wav

Get Available Voices

curl http://localhost:7860/voices

Using the provided client example

python client_example.py

Requirements

  • Python 3.11+
  • espeak-ng system package
  • CUDA-compatible GPU (optional, for faster processing)

Model Information

This API uses Kokoro TTS, which:

  • Has 82 million parameters
  • Supports multiple voices and languages
  • Provides fast, high-quality speech synthesis
  • Uses Apache-licensed weights
  • Requires minimal system resources compared to larger models

Testing

Run the standalone test:

python test.py

Run the installation test:

python test_kokoro_install.py

For debugging on Hugging Face Spaces:

python startup.py

This will generate audio files demonstrating Kokoro's capabilities.

Environment Variables

The following environment variables are automatically configured:

  • HF_HOME=/tmp/hf_cache - Hugging Face cache directory
  • TRANSFORMERS_CACHE=/tmp/hf_cache - Transformers cache
  • HF_HUB_CACHE=/tmp/hf_cache - HF Hub cache
  • TORCH_HOME=/tmp/torch_cache - PyTorch cache
  • NUMBA_CACHE_DIR=/tmp/numba_cache - Numba cache
  • NUMBA_DISABLE_JIT=1 - Disable Numba JIT compilation

These are set automatically by the application for optimal performance on Hugging Face Spaces.