Spaces:

rbgo
/

Open-Source-TTS-Gallary

Running

App Files Files Community

rbgo commited on 5 days ago

Commit

ad52334

verified ·

1 Parent(s): 8445001

Update app.py

Browse files

Files changed (1) hide show

app.py +27 -0

app.py CHANGED Viewed

@@ -222,6 +222,33 @@ def create_interface():
                 <li><em>...and 6 more incredible models!</em></li>
             </ul>
         </div>
         """)

                 <li><em>...and 6 more incredible models!</em></li>
             </ul>
+            <h3>🔑 Key Findings</h3>
+            <ol>
+                <li><strong>Outstanding Speech Quality</strong><br>
+                    Several models—namely <strong>Kokoro-82M</strong>, <strong>csm-1b</strong>, <strong>Spark-TTS-0.5B</strong>,
+                    <strong>Orpheus-3b-0.1-ft</strong>, <strong>F5-TTS</strong>, and <strong>Llasa-3B</strong>—delivered exceptionally
+                    natural, clear, and realistic synthesized speech. Among these, <strong>csm-1b</strong> and <strong>F5-TTS</strong>
+                    stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
+                </li>
+                <li><strong>Superior Controllability</strong><br>
+                    <strong>Zonos-v0.1-transformer</strong> emerged as the leader in fine-grained control: it offers detailed
+                    adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise
+                    voice modulation.
+                </li>
+                <li><strong>Performance vs. Footprint Trade-off</strong><br>
+                    Smaller models (e.g., <strong>Kokoro-82M</strong> at 82 million parameters) can still achieve “Good” or
+                    “Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical.
+                    Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual
+                    synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
+                </li>
+                <li><strong>Special Notes on Multilingual & Cloning Capabilities</strong><br>
+                    <strong>Spark-TTS-0.5B</strong> and <strong>XTTS-v2</strong> excel at cross-lingual and zero-shot voice
+                    cloning, making them strong candidates for projects that need multi-language support or short-clip cloning.
+                    <strong>Llama-OuteTTS-1.0-1B</strong> and <strong>MegaTTS3</strong> also offer multilingual input handling,
+                    though they may require careful sampling parameter tuning to achieve optimal results.
+                </li>
+            </ol>
         </div>
         """)