Configuration error
Configuration error
!!! warning
This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize]( [fork](
TODO: add a note about automatic downloads TODO: add a demo TODO: add a note about tts only running on cpu TODO: add a note about exploring other models TODO: add a note about performance
!!! note
Before proceeding, make sure you are familiar with the [OpenAI Text-to-Speech]( and the relevant [OpenAI API reference](
Download the piper voices from HuggingFace model repository
# Download all voices (~15 minutes / 7.7 Gbs)
docker exec -it speaches huggingface-cli download rhasspy/piper-voices
# Download all English voices (~4.5 minutes)
docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
# Download all qualities of a specific voice (~4 seconds)
docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
# Download specific quality of a specific voice (~2 seconds)
docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
!!! note
You can find audio samples of all the available voices [here](
# Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.)
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3
# Specifying the output format
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav
# Specifying the audio speed
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3
# List available (downloaded) voices
curl http://localhost:8000/v1/audio/speech/voices
# List just the voice names
curl http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | .voice'
# List just the voices in your language
curl --silent http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | select(.voice | startswith("en")) | .voice'
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3
=== "httpx"
from pathlib import Path
import httpx
client = httpx.Client(base_url="http://localhost:8000/")
res =
"model": "piper",
"voice": "en_US-amy-medium",
"input": "Hello, world!",
"response_format": "mp3",
"speed": 1,
with Path("output.mp3").open("wb") as f:
!!! note
Although this project doesn't require an API key, all OpenAI SDKs require an API key. Therefore, you will need to set it to a non-empty value. Additionally, you will need to overwrite the base URL to point to your server.
This can be done by setting the `OPENAI_API_KEY` and `OPENAI_BASE_URL` environment variables or by passing them as arguments to the SDK.
=== "Python"
from pathlib import Path
from openai import OpenAI
openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
res =
voice="en_US-amy-medium", # pyright: ignore[reportArgumentType]
input="Hello, world!",
with Path("output.mp3").open("wb") as f:
=== "Other"
See [OpenAI libraries](