rbgo commited on
Commit
7743176
·
verified ·
1 Parent(s): ad52334

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +11 -11
app.py CHANGED
@@ -291,21 +291,21 @@ def create_interface():
291
  # """)
292
 
293
  # gr.Markdown("---")
294
- gr.Markdown("""
295
- ## 🔑 Key Findings
296
 
297
- 1. **Outstanding Speech Quality**
298
- Several models—namely **Kokoro-82M**, **csm-1b**, **Spark-TTS-0.5B**, **Orpheus-3b-0.1-ft**, **F5-TTS**, and **Llasa-3B**—delivered exceptionally natural, clear, and realistic synthesized speech. Among these, **csm-1b** and **F5-TTS** stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
299
 
300
- 2. **Superior Controllability**
301
- **Zonos-v0.1-transformer** emerged as the leader in fine-grained control: it offers detailed adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise voice modulation.
302
 
303
- 3. **Performance vs. Footprint Trade-off**
304
- Smaller models (e.g., **Kokoro-82M** at 82 million parameters) can still achieve “Good” or “Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical. Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
305
 
306
- 4. **Special Notes on Multilingual & Cloning Capabilities**
307
- **Spark-TTS-0.5B** and **XTTS-v2** excel at cross-lingual and zero-shot voice cloning, making them strong candidates for projects that need multi-language support or short-clip cloning. **Llama-OuteTTS-1.0-1B** and **MegaTTS3** also offer multilingual input handling, though they may require careful sampling parameter tuning to achieve optimal results.
308
- """)
309
 
310
  # Search and Filter Section
311
  with gr.Row():
 
291
  # """)
292
 
293
  # gr.Markdown("---")
294
+ # gr.Markdown("""
295
+ # ## 🔑 Key Findings
296
 
297
+ # 1. **Outstanding Speech Quality**
298
+ # Several models—namely **Kokoro-82M**, **csm-1b**, **Spark-TTS-0.5B**, **Orpheus-3b-0.1-ft**, **F5-TTS**, and **Llasa-3B**—delivered exceptionally natural, clear, and realistic synthesized speech. Among these, **csm-1b** and **F5-TTS** stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
299
 
300
+ # 2. **Superior Controllability**
301
+ # **Zonos-v0.1-transformer** emerged as the leader in fine-grained control: it offers detailed adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise voice modulation.
302
 
303
+ # 3. **Performance vs. Footprint Trade-off**
304
+ # Smaller models (e.g., **Kokoro-82M** at 82 million parameters) can still achieve “Good” or “Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical. Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
305
 
306
+ # 4. **Special Notes on Multilingual & Cloning Capabilities**
307
+ # **Spark-TTS-0.5B** and **XTTS-v2** excel at cross-lingual and zero-shot voice cloning, making them strong candidates for projects that need multi-language support or short-clip cloning. **Llama-OuteTTS-1.0-1B** and **MegaTTS3** also offer multilingual input handling, though they may require careful sampling parameter tuning to achieve optimal results.
308
+ # """)
309
 
310
  # Search and Filter Section
311
  with gr.Row():