Spaces:

diabolic6045
/

tts-api

Sleeping

App Files Files Community

tts-api / README.md

Avinyaa

5efbc82 4 months ago

preview code

raw

history blame

5.91 kB

metadata

title: Kokoro TTS API
emoji: 🎤
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

Kokoro TTS API

A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.

Features

Convert text to speech using Kokoro TTS
Multiple voice options (af_heart, af_sky, af_bella, etc.)
Automatic language detection
RESTful API with automatic documentation
Docker support
Lightweight and fast processing
Apache-licensed weights
Optimized for Hugging Face Spaces deployment

About Kokoro

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Setup

Hugging Face Spaces Deployment

This API is optimized for Hugging Face Spaces deployment. The Docker configuration automatically handles:

Cache directory setup with proper permissions
Environment variable configuration
Model downloading and caching

Simply deploy to Hugging Face Spaces using the Docker SDK.

Troubleshooting on HF Spaces

If you encounter permission errors, you can use the diagnostic startup script:

Change the Dockerfile CMD to: CMD ["python", "startup.py"]
This will run diagnostics and show detailed information about the environment

Local Development

Install system dependencies:

# On Ubuntu/Debian
sudo apt-get install espeak-ng

# On macOS
brew install espeak

Install Python dependencies:

pip install -r requirements.txt

Run the API:

uvicorn app:app --host 0.0.0.0 --port 7860

The API will be available at http://localhost:7860

Using Docker

Build the Docker image:

docker build -t kokoro-tts-api .

Run the container:

docker run -p 7860:7860 kokoro-tts-api

API Endpoints

Health Check

GET /health - Check API status and device information

Available Voices

GET /voices - Get list of available voices

Text-to-Speech (Form Data)

POST /tts - Convert text to speech using form data
- Parameters:
  - text (form): Text to convert to speech
  - voice (form): Voice to use (default: "af_heart")
  - lang_code (form): Language code (default: "a" for auto-detect)

Text-to-Speech (JSON)

POST /tts-json - Convert text to speech using JSON request body
- Body: JSON object with text, voice, and lang_code fields

API Documentation

GET /docs - Interactive API documentation (Swagger UI)
GET /redoc - Alternative API documentation

Available Voices

af_heart - Female voice (Heart)
af_sky - Female voice (Sky)
af_bella - Female voice (Bella)
af_sarah - Female voice (Sarah)
af_nicole - Female voice (Nicole)
am_adam - Male voice (Adam)
am_michael - Male voice (Michael)
am_edward - Male voice (Edward)
am_lewis - Male voice (Lewis)

Usage Examples

Using Python requests (Form Data)

import requests

# Prepare the request
url = "http://localhost:7860/tts"
data = {
    "text": "Hello, this is Kokoro TTS in action!",
    "voice": "af_heart",
    "lang_code": "a"
}

# Make the request
response = requests.post(url, data=data)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using Python requests (JSON)

import requests

# Prepare the JSON request
url = "http://localhost:7860/tts-json"
data = {
    "text": "Kokoro delivers high-quality speech synthesis!",
    "voice": "af_bella",
    "lang_code": "a"
}

headers = {"Content-Type": "application/json"}

# Make the request
response = requests.post(url, json=data, headers=headers)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_json_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using curl (Form Data)

curl -X POST "http://localhost:7860/tts" \
  -F "text=Hello from Kokoro TTS!" \
  -F "voice=af_heart" \
  -F "lang_code=a" \
  --output kokoro_speech.wav

Using curl (JSON)

curl -X POST "http://localhost:7860/tts-json" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
  --output kokoro_speech.wav

Get Available Voices

curl http://localhost:7860/voices

Using the provided client example

python client_example.py

Requirements

Python 3.11+
espeak-ng system package
CUDA-compatible GPU (optional, for faster processing)

Model Information

This API uses Kokoro TTS, which:

Has 82 million parameters
Supports multiple voices and languages
Provides fast, high-quality speech synthesis
Uses Apache-licensed weights
Requires minimal system resources compared to larger models

Testing

Run the standalone test:

python test.py

Run the installation test:

python test_kokoro_install.py

For debugging on Hugging Face Spaces:

python startup.py

This will generate audio files demonstrating Kokoro's capabilities.

Environment Variables

The following environment variables are automatically configured:

HF_HOME=/tmp/hf_cache - Hugging Face cache directory
TRANSFORMERS_CACHE=/tmp/hf_cache - Transformers cache
HF_HUB_CACHE=/tmp/hf_cache - HF Hub cache
TORCH_HOME=/tmp/torch_cache - PyTorch cache
NUMBA_CACHE_DIR=/tmp/numba_cache - Numba cache
NUMBA_DISABLE_JIT=1 - Disable Numba JIT compilation

These are set automatically by the application for optimal performance on Hugging Face Spaces.