Truncation : Audio OP limitation

by avneetsingh - opened Dec 18, 2024

Dec 18, 2024

Hi I am trying to use this to create audios of about 5 sentences in Bengali and Marathi but i am facing abrupt endings in my audio OPs. Any Solutions to this problem ?

avneetsingh

Dec 18, 2024

What i have read from the Parler TTS is that the max is OP is 30 seconds . Can we work around the same ?
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training

avneetsingh changed discussion title from Truncation. to Truncation : Audio OP limitation Dec 18, 2024

AshwinSankar

AI4Bharat org Dec 19, 2024

Hi, Indic Parler-TTS can consistently generate sequences of upto 10-12 seconds. We are working on improving the model. But for your particular use case, I suggest splitting the sentences into different items and doing a batch generate instead as that will give the best quality while remaining consistent with the prompt that you have described.

gaganyatri

about 1 month ago

Hey @avneetsingh .
I have used a combination of chunking and batch generation to use for large sentences.

https://github.com/slabstech/llm-recipes/blob/main/python/notebooklm/audiobook/utils/batch_inference_chunked.py

PS : chunking introduces quality issues due to losing context.
I am trying tokenizer based strategy / Filter sentence via text LLM to chunk rather than current word count based

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment