Spaces:

diabolic6045
/

tts-api

Running

App Files Files Community

tts-api / README.md

Avinyaa

new

9a88d9c 3 months ago

preview code

raw

history blame

2.32 kB

	---
	title: Tts Api
	emoji: 🚀
	colorFrom: indigo
	colorTo: yellow
	sdk: docker
	pinned: false
	---

	# TTS API

	A FastAPI-based Text-to-Speech API using XTTS-v2 for voice cloning.

	## Features

	- Convert text to speech using voice cloning
	- Upload reference speaker audio files
	- Support for multiple languages
	- RESTful API with automatic documentation
	- Docker support

	## Setup

	### Local Development

	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Run the API:
	```bash
	python app.py
	```

	The API will be available at `http://localhost:8000`

	### Using Docker

	1. Build the Docker image:
	```bash
	docker build -t tts-api .
	```

	2. Run the container:
	```bash
	docker run -p 8000:8000 tts-api
	```

	## API Endpoints

	### Health Check
	- GET `/health` - Check API status

	### Text-to-Speech
	- POST `/tts` - Convert text to speech with uploaded speaker file
	- Parameters:
	- `text` (form): Text to convert to speech
	- `language` (form): Language code (default: "en")
	- `speaker_file` (file): Reference speaker audio file

	### API Documentation
	- GET `/docs` - Interactive API documentation (Swagger UI)
	- GET `/redoc` - Alternative API documentation

	## Usage Examples

	### Using Python requests

	```python
	import requests

	# Prepare the request
	url = "http://localhost:8000/tts"
	data = {
	"text": "Hello, this is a test of voice cloning!",
	"language": "en"
	}
	files = {
	"speaker_file": open("path/to/speaker.wav", "rb")
	}

	# Make the request
	response = requests.post(url, data=data, files=files)

	# Save the generated audio
	if response.status_code == 200:
	with open("output.wav", "wb") as f:
	f.write(response.content)
	print("Speech generated successfully!")
	```

	### Using curl

	```bash
	curl -X POST "http://localhost:8000/tts" \
	-F "text=Hello, this is a test!" \
	-F "language=en" \
	-F "speaker_file=@path/to/speaker.wav" \
	--output generated_speech.wav
	```

	### Using the provided client example

	```bash
	python client_example.py
	```

	## Requirements

	- Python 3.8+
	- CUDA-compatible GPU (recommended for faster processing)
	- Audio file in supported format (WAV, MP3, etc.) for speaker reference

	## Model

	This API uses the XTTS-v2_C3PO model for voice cloning, which is automatically downloaded when building the Docker image.