Spaces:

rbgo
/

Open-Source-TTS-Gallary

Running

App Files Files Community

rbgo commited on 5 days ago

Commit

8445001

verified ·

1 Parent(s): 452aeee

Update app.py

Browse files

Files changed (1) hide show

app.py +41 -26

app.py CHANGED Viewed

@@ -237,33 +237,48 @@ def create_interface():
         # """)
         # Evaluation Criteria
-        with gr.Row():
-            with gr.Column():
-                gr.HTML("""
-                <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
-                    <div style="font-size: 2rem;">🎭</div>
-                    <strong>Naturalness</strong><br>
-                    <small>Human-like quality & emotional expression</small>
-                </div>
-                """)
-            with gr.Column():
-                gr.HTML("""
-                <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
-                    <div style="font-size: 2rem;">🗣️</div>
-                    <strong>Intelligibility</strong><br>
-                    <small>Clarity & pronunciation accuracy</small>
-                </div>
-                """)
-            with gr.Column():
-                gr.HTML("""
-                <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
-                    <div style="font-size: 2rem;">🎛️</div>
-                    <strong>Controllability</strong><br>
-                    <small>Tone, pace & parameter flexibility</small>
-                </div>
-                """)
-        gr.Markdown("---")
         # Search and Filter Section
         with gr.Row():

         # """)
         # Evaluation Criteria
+        # with gr.Row():
+        #     with gr.Column():
+        #         gr.HTML("""
+        #         <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
+        #             <div style="font-size: 2rem;">🎭</div>
+        #             <strong>Naturalness</strong><br>
+        #             <small>Human-like quality & emotional expression</small>
+        #         </div>
+        #         """)
+        #     with gr.Column():
+        #         gr.HTML("""
+        #         <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
+        #             <div style="font-size: 2rem;">🗣️</div>
+        #             <strong>Intelligibility</strong><br>
+        #             <small>Clarity & pronunciation accuracy</small>
+        #         </div>
+        #         """)
+        #     with gr.Column():
+        #         gr.HTML("""
+        #         <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
+        #             <div style="font-size: 2rem;">🎛️</div>
+        #             <strong>Controllability</strong><br>
+        #             <small>Tone, pace & parameter flexibility</small>
+        #         </div>
+        #         """)
+        # gr.Markdown("---")
+        gr.Markdown("""
+                    ## 🔑 Key Findings
+                    1. **Outstanding Speech Quality**
+                       Several models—namely **Kokoro-82M**, **csm-1b**, **Spark-TTS-0.5B**, **Orpheus-3b-0.1-ft**, **F5-TTS**, and **Llasa-3B**—delivered exceptionally natural, clear, and realistic synthesized speech. Among these, **csm-1b** and **F5-TTS** stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
+                    2. **Superior Controllability**
+                       **Zonos-v0.1-transformer** emerged as the leader in fine-grained control: it offers detailed adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise voice modulation.
+                    3. **Performance vs. Footprint Trade-off**
+                       Smaller models (e.g., **Kokoro-82M** at 82 million parameters) can still achieve “Good” or “Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical. Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
+                    4. **Special Notes on Multilingual & Cloning Capabilities**
+                       **Spark-TTS-0.5B** and **XTTS-v2** excel at cross-lingual and zero-shot voice cloning, making them strong candidates for projects that need multi-language support or short-clip cloning. **Llama-OuteTTS-1.0-1B** and **MegaTTS3** also offer multilingual input handling, though they may require careful sampling parameter tuning to achieve optimal results.
+                            """)
         # Search and Filter Section
         with gr.Row():