Text-to-Speech
Transformers
Safetensors
parler_tts
text2text-generation
annotation

Truncation : Audio OP limitation

#4
by avneetsingh - opened

Hi I am trying to use this to create audios of about 5 sentences in Bengali and Marathi but i am facing abrupt endings in my audio OPs. Any Solutions to this problem ?

What i have read from the Parler TTS is that the max is OP is 30 seconds . Can we work around the same ?
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training

avneetsingh changed discussion title from Truncation. to Truncation : Audio OP limitation
AI4Bharat org

Hi, Indic Parler-TTS can consistently generate sequences of upto 10-12 seconds. We are working on improving the model. But for your particular use case, I suggest splitting the sentences into different items and doing a batch generate instead as that will give the best quality while remaining consistent with the prompt that you have described.

Hey @avneetsingh .
I have used a combination of chunking and batch generation to use for large sentences.

https://github.com/slabstech/llm-recipes/blob/main/python/notebooklm/audiobook/utils/batch_inference_chunked.py

PS : chunking introduces quality issues due to losing context.
I am trying tokenizer based strategy / Filter sentence via text LLM to chunk rather than current word count based

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment