mrfakename commited on
Commit
1583e1c
·
verified ·
1 Parent(s): 34d3b0e

Sync from GitHub repo

Browse files

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

src/f5_tts/infer/README.md CHANGED
@@ -4,16 +4,17 @@ The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://h
4
 
5
  **More checkpoints with whole community efforts can be found in [SHARED.md](SHARED.md), supporting more languages.**
6
 
7
- Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.
8
 
9
  To avoid possible inference failures, make sure you have seen through the following instructions.
10
 
11
- - Use reference audio <15s and leave some silence (e.g. 1s) at the end. Otherwise there is a risk of truncating in the middle of word, leading to suboptimal generation.
12
- - Uppercased letters will be uttered letter by letter, so use lowercased letters for normal words.
13
- - Add some spaces (blank: " ") or punctuations (e.g. "," ".") to explicitly introduce some pauses.
14
- - Preprocess numbers to Chinese letters if you want to have them read in Chinese, otherwise in English.
15
- - If the generation output is blank (pure silence), check for ffmpeg installation (various tutorials online, blogs, videos, etc.).
16
- - Try turn off use_ema if using an early-stage finetuned checkpoint (which goes just few updates).
 
17
 
18
 
19
  ## Gradio App
 
4
 
5
  **More checkpoints with whole community efforts can be found in [SHARED.md](SHARED.md), supporting more languages.**
6
 
7
+ Currently support **30s for a single** generation, which is the **total length** (same logic if `fix_duration`) including both prompt and output audio. However, `infer_cli` and `infer_gradio` will automatically do chunk generation for longer text input. Long reference audio will be **clip short to ~12s**.
8
 
9
  To avoid possible inference failures, make sure you have seen through the following instructions.
10
 
11
+ - Use reference audio <12s and leave proper silence space (e.g. 1s) at the end. Otherwise there is a risk of truncating in the middle of word, leading to suboptimal generation.
12
+ - **Uppercased** letters (best with form like K.F.C.) will be uttered letter by letter, and lowercased letters used for common words.
13
+ - Add some spaces (blank: " ") or punctuations (e.g. "," ".") to explicitly introduce some **pauses**.
14
+ - If English punctuation marks the end of a sentence, make sure there is a space " " after it. Otherwise not regarded as when chunk.
15
+ - Preprocess **numbers** to Chinese letters if you want to have them read in Chinese, otherwise in English.
16
+ - If the generation output is blank (pure silence), check for **ffmpeg** installation.
17
+ - Try turn off **use_ema** if using an early-stage finetuned checkpoint (which goes just few updates).
18
 
19
 
20
  ## Gradio App
src/f5_tts/train/README.md CHANGED
@@ -51,7 +51,9 @@ Discussion board for Finetuning [#57](https://github.com/SWivid/F5-TTS/discussio
51
 
52
  Gradio UI training/finetuning with `src/f5_tts/train/finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).
53
 
54
- The `use_ema = True` is harmful for early-stage finetuned checkpoints (which goes just few updates, thus ema weights still dominated by pretrained ones), try turn it off and see if provide better results.
 
 
55
 
56
  ### 3. W&B Logging
57
 
 
51
 
52
  Gradio UI training/finetuning with `src/f5_tts/train/finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).
53
 
54
+ The **`use_ema = True` might be harmful for early-stage finetuned checkpoints** (which goes just few updates, thus ema weights still dominated by pretrained ones), try turn it off (`load_model(..., use_ema=False)`) and see if offer better results.
55
+
56
+ If use tensorboard as logger, install it first with `pip install tensorboard`.
57
 
58
  ### 3. W&B Logging
59