YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

F5-TTS Model Inference Guide

Welcome! This guide will walk you through the steps to load and run the F5-TTS model for text-to-speech synthesis using reference audio and text inputs.


Did You Know?

Text-to-speech models like F5-TTS can mimic voice characteristics by analyzing just a few seconds of audio input. This adaptability is paving the way for personalized, AI-driven audio content.


Steps to Run the F5-TTS Model

1. Clone the Repository

Start by cloning the F5-TTS repository to your local environment:

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS

2. Download the model weights

copy the download link of the model file and download using wget

wget https://hf.rst.im/ModelsLab/F5-tts-brazilian/resolve/main/Brazilian_Portuguese/model_2600000.pt -P ckpts/

3. Install CUDA

Install an appropriate CUDA version compatible with your PyTorch and TorchAudio versions to enable GPU support.

pip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

4. Install Required Python Packages

Install the required dependencies specified in the requirements.txt file to set up your environment:

pip install -r requirements.txt

5. System Setup: APT Update, FFmpeg, and CUDA

Before running inference, ensure your system has the necessary dependencies:

Update APT Packages and Install FFmpeg

FFmpeg is essential for audio processing tasks. Update your APT packages and install ffmpeg with the following commands:

apt update
apt install -y ffmpeg

6. Run Inference with the F5-TTS Model

With the environment ready, you can now run the inference script. Adjust the paths as needed:

python inference-cli.py \
  
  # Specify the model name to use for inference
  --model "F5-TTS" \
  
  # Path to the model checkpoint file, which contains the saved model weights
  --ckpt_file "path/to/model.pt" \
  
  # Path to the reference audio file. This file is used to capture the speaking style
  # and voice characteristics, which the model will try to mimic.
  --ref_audio "wavs/sample_audio.wav" \
  
  # Reference text associated with the reference audio file.
  # This helps the model understand the speaking style.
  --ref_text "levantara a mão contra ele e o oficial então arrancara da espada e atravessara o de lado a lado estava direito ah" \
  
  # Text that the model will generate speech for. This will be spoken in the style
  # derived from the reference audio and text.
  --gen_text "O Brasil, oficialmente República Federativa do Brasil, é o maior país da América do Sul e da América Latina."
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.