tts-api / README.md
Avinyaa
new
9a88d9c
|
raw
history blame
2.32 kB
---
title: Tts Api
emoji: πŸš€
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
---
# TTS API
A FastAPI-based Text-to-Speech API using XTTS-v2 for voice cloning.
## Features
- Convert text to speech using voice cloning
- Upload reference speaker audio files
- Support for multiple languages
- RESTful API with automatic documentation
- Docker support
## Setup
### Local Development
1. Install dependencies:
```bash
pip install -r requirements.txt
```
2. Run the API:
```bash
python app.py
```
The API will be available at `http://localhost:8000`
### Using Docker
1. Build the Docker image:
```bash
docker build -t tts-api .
```
2. Run the container:
```bash
docker run -p 8000:8000 tts-api
```
## API Endpoints
### Health Check
- **GET** `/health` - Check API status
### Text-to-Speech
- **POST** `/tts` - Convert text to speech with uploaded speaker file
- **Parameters:**
- `text` (form): Text to convert to speech
- `language` (form): Language code (default: "en")
- `speaker_file` (file): Reference speaker audio file
### API Documentation
- **GET** `/docs` - Interactive API documentation (Swagger UI)
- **GET** `/redoc` - Alternative API documentation
## Usage Examples
### Using Python requests
```python
import requests
# Prepare the request
url = "http://localhost:8000/tts"
data = {
"text": "Hello, this is a test of voice cloning!",
"language": "en"
}
files = {
"speaker_file": open("path/to/speaker.wav", "rb")
}
# Make the request
response = requests.post(url, data=data, files=files)
# Save the generated audio
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Speech generated successfully!")
```
### Using curl
```bash
curl -X POST "http://localhost:8000/tts" \
-F "text=Hello, this is a test!" \
-F "language=en" \
-F "speaker_file=@path/to/speaker.wav" \
--output generated_speech.wav
```
### Using the provided client example
```bash
python client_example.py
```
## Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for faster processing)
- Audio file in supported format (WAV, MP3, etc.) for speaker reference
## Model
This API uses the XTTS-v2_C3PO model for voice cloning, which is automatically downloaded when building the Docker image.