Spaces:
Sleeping
title: Kokoro TTS API
emoji: 🎤
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
Kokoro TTS API
A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.
Features
- Convert text to speech using Kokoro TTS
- Multiple voice options (af_heart, af_sky, af_bella, etc.)
- Automatic language detection
- RESTful API with automatic documentation
- Docker support
- Lightweight and fast processing
- Apache-licensed weights
- Optimized for Hugging Face Spaces deployment
About Kokoro
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Setup
Hugging Face Spaces Deployment
This API is optimized for Hugging Face Spaces deployment. The Docker configuration automatically handles:
- Cache directory setup with proper permissions
- Environment variable configuration
- Model downloading and caching
Simply deploy to Hugging Face Spaces using the Docker SDK.
Troubleshooting on HF Spaces
If you encounter permission errors, you can use the diagnostic startup script:
- Change the Dockerfile CMD to:
CMD ["python", "startup.py"]
- This will run diagnostics and show detailed information about the environment
Local Development
- Install system dependencies:
# On Ubuntu/Debian
sudo apt-get install espeak-ng
# On macOS
brew install espeak
- Install Python dependencies:
pip install -r requirements.txt
- Run the API:
uvicorn app:app --host 0.0.0.0 --port 7860
The API will be available at http://localhost:7860
Using Docker
- Build the Docker image:
docker build -t kokoro-tts-api .
- Run the container:
docker run -p 7860:7860 kokoro-tts-api
API Endpoints
Health Check
- GET
/health
- Check API status and device information
Available Voices
- GET
/voices
- Get list of available voices
Text-to-Speech (Form Data)
- POST
/tts
- Convert text to speech using form data- Parameters:
text
(form): Text to convert to speechvoice
(form): Voice to use (default: "af_heart")lang_code
(form): Language code (default: "a" for auto-detect)
- Parameters:
Text-to-Speech (JSON)
- POST
/tts-json
- Convert text to speech using JSON request body- Body: JSON object with
text
,voice
, andlang_code
fields
- Body: JSON object with
API Documentation
- GET
/docs
- Interactive API documentation (Swagger UI) - GET
/redoc
- Alternative API documentation
Available Voices
af_heart
- Female voice (Heart)af_sky
- Female voice (Sky)af_bella
- Female voice (Bella)af_sarah
- Female voice (Sarah)af_nicole
- Female voice (Nicole)am_adam
- Male voice (Adam)am_michael
- Male voice (Michael)am_edward
- Male voice (Edward)am_lewis
- Male voice (Lewis)
Usage Examples
Using Python requests (Form Data)
import requests
# Prepare the request
url = "http://localhost:7860/tts"
data = {
"text": "Hello, this is Kokoro TTS in action!",
"voice": "af_heart",
"lang_code": "a"
}
# Make the request
response = requests.post(url, data=data)
# Save the generated audio
if response.status_code == 200:
with open("kokoro_output.wav", "wb") as f:
f.write(response.content)
print("Speech generated successfully!")
Using Python requests (JSON)
import requests
# Prepare the JSON request
url = "http://localhost:7860/tts-json"
data = {
"text": "Kokoro delivers high-quality speech synthesis!",
"voice": "af_bella",
"lang_code": "a"
}
headers = {"Content-Type": "application/json"}
# Make the request
response = requests.post(url, json=data, headers=headers)
# Save the generated audio
if response.status_code == 200:
with open("kokoro_json_output.wav", "wb") as f:
f.write(response.content)
print("Speech generated successfully!")
Using curl (Form Data)
curl -X POST "http://localhost:7860/tts" \
-F "text=Hello from Kokoro TTS!" \
-F "voice=af_heart" \
-F "lang_code=a" \
--output kokoro_speech.wav
Using curl (JSON)
curl -X POST "http://localhost:7860/tts-json" \
-H "Content-Type: application/json" \
-d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
--output kokoro_speech.wav
Get Available Voices
curl http://localhost:7860/voices
Using the provided client example
python client_example.py
Requirements
- Python 3.11+
- espeak-ng system package
- CUDA-compatible GPU (optional, for faster processing)
Model Information
This API uses Kokoro TTS, which:
- Has 82 million parameters
- Supports multiple voices and languages
- Provides fast, high-quality speech synthesis
- Uses Apache-licensed weights
- Requires minimal system resources compared to larger models
Testing
Run the standalone test:
python test.py
Run the installation test:
python test_kokoro_install.py
For debugging on Hugging Face Spaces:
python startup.py
This will generate audio files demonstrating Kokoro's capabilities.
Environment Variables
The following environment variables are automatically configured:
HF_HOME=/tmp/hf_cache
- Hugging Face cache directoryTRANSFORMERS_CACHE=/tmp/hf_cache
- Transformers cacheHF_HUB_CACHE=/tmp/hf_cache
- HF Hub cacheTORCH_HOME=/tmp/torch_cache
- PyTorch cacheNUMBA_CACHE_DIR=/tmp/numba_cache
- Numba cacheNUMBA_DISABLE_JIT=1
- Disable Numba JIT compilation
These are set automatically by the application for optimal performance on Hugging Face Spaces.