Text-to-Speech
F5-TTS
Hindi

Generated output.wav file found containing static noise only.

#14
by arshneo - opened

Hi Friends ! I have 143 small audio files of approx 70 minutes length. Each audio file is of upto 30 seconds. I split them into train and validation sets in 80-20 ratio. Fine-tuned for around 50 epocs with following command:
"!PYTHONPATH=/content/drive/MyDrive/Hindi/F5-TTS/src
python -m f5_tts.train.finetune_cli
--dataset_name Makhan_dataset_prepped_char
--tokenizer char
--learning_rate 1e-7
--batch_size_per_gpu 3200
--epochs 50
--logger tensorboard
--num_workers 2 ".
After it i got model_last.pt file on which i run infer_cli.py using following code:
"!PYTHONPATH=/content/drive/MyDrive/Hindi/F5-TTS/src python -m f5_tts.infer.infer_cli
--gen_file /content/drive/MyDrive/Hindi/F5-TTS/data/gen_folder/gen_text.txt
--ref_audio /content/drive/MyDrive/Hindi/F5-TTS/data/ref_folder/ref_audio.wav
--ref_text 'एक स्विस पास है जो तीन, चार, छह, आठ और 15 दिन की वैलिडिटी के लिए मिलता है। ये कंटीन्यूअस है। आपने जिस दिन से पहला दिन स्टार्ट किया उससे आपके दिन स्टार्ट हो जाएंगे। मैंने यही पास लिया था, आठ दिन वाला और मैंने इसके ₹40,000 दिए थे।'
--ckpt_file /content/drive/MyDrive/Hindi/F5-TTS/ckpts/Makhan_dataset_prepped_char/model_last.pt
--vocab_file /content/drive/MyDrive/Hindi/F5-TTS/data/Makhan_dataset_prepped_char/train/vocab.txt
--output_dir /content/drive/MyDrive/Hindi/F5-TTS/data/output_folder
--output_file output.wav
--nfe 64 ".
At the end it generated output.wav which contains static noise only which is uploaded here.

Can anybody help me? What am I not doing correctly?

Sign up or log in to comment