Spaces:

diabolic6045
/

tts-api

Sleeping

App Files Files Community

tts-api / README.md

Avinyaa

new

9acb9c3 4 months ago

preview code

raw

history blame

4.64 kB

metadata

title: Kokoro TTS API
emoji: 🎤
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false

Kokoro TTS API

A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.

Features

Convert text to speech using Kokoro TTS
Multiple voice options (af_heart, af_sky, af_bella, etc.)
Automatic language detection
RESTful API with automatic documentation
Docker support
Lightweight and fast processing
Apache-licensed weights

About Kokoro

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Setup

Local Development

Install system dependencies:

# On Ubuntu/Debian
sudo apt-get install espeak-ng

# On macOS
brew install espeak

Install Python dependencies:

pip install -r requirements.txt

Run the API:

uvicorn app:app --host 0.0.0.0 --port 7860

The API will be available at http://localhost:7860

Using Docker

Build the Docker image:

docker build -t kokoro-tts-api .

Run the container:

docker run -p 7860:7860 kokoro-tts-api

API Endpoints

Health Check

GET /health - Check API status and device information

Available Voices

GET /voices - Get list of available voices

Text-to-Speech (Form Data)

POST /tts - Convert text to speech using form data
- Parameters:
  - text (form): Text to convert to speech
  - voice (form): Voice to use (default: "af_heart")
  - lang_code (form): Language code (default: "a" for auto-detect)

Text-to-Speech (JSON)

POST /tts-json - Convert text to speech using JSON request body
- Body: JSON object with text, voice, and lang_code fields

API Documentation

GET /docs - Interactive API documentation (Swagger UI)
GET /redoc - Alternative API documentation

Available Voices

af_heart - Female voice (Heart)
af_sky - Female voice (Sky)
af_bella - Female voice (Bella)
af_sarah - Female voice (Sarah)
af_nicole - Female voice (Nicole)
am_adam - Male voice (Adam)
am_michael - Male voice (Michael)
am_edward - Male voice (Edward)
am_lewis - Male voice (Lewis)

Usage Examples

Using Python requests (Form Data)

import requests

# Prepare the request
url = "http://localhost:7860/tts"
data = {
    "text": "Hello, this is Kokoro TTS in action!",
    "voice": "af_heart",
    "lang_code": "a"
}

# Make the request
response = requests.post(url, data=data)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using Python requests (JSON)

import requests

# Prepare the JSON request
url = "http://localhost:7860/tts-json"
data = {
    "text": "Kokoro delivers high-quality speech synthesis!",
    "voice": "af_bella",
    "lang_code": "a"
}

headers = {"Content-Type": "application/json"}

# Make the request
response = requests.post(url, json=data, headers=headers)

# Save the generated audio
if response.status_code == 200:
    with open("kokoro_json_output.wav", "wb") as f:
        f.write(response.content)
    print("Speech generated successfully!")

Using curl (Form Data)

curl -X POST "http://localhost:7860/tts" \
  -F "text=Hello from Kokoro TTS!" \
  -F "voice=af_heart" \
  -F "lang_code=a" \
  --output kokoro_speech.wav

Using curl (JSON)

curl -X POST "http://localhost:7860/tts-json" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
  --output kokoro_speech.wav

Get Available Voices

curl http://localhost:7860/voices

Using the provided client example

python client_example.py

Requirements

Python 3.11+
espeak-ng system package
CUDA-compatible GPU (optional, for faster processing)

Model Information

This API uses Kokoro TTS, which:

Has 82 million parameters
Supports multiple voices and languages
Provides fast, high-quality speech synthesis
Uses Apache-licensed weights
Requires minimal system resources compared to larger models

Testing

Run the standalone test:

python test.py

This will generate audio files demonstrating Kokoro's capabilities.