Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -222,6 +222,33 @@ def create_interface():
|
|
222 |
<li><em>...and 6 more incredible models!</em></li>
|
223 |
</ul>
|
224 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
225 |
</div>
|
226 |
""")
|
227 |
|
|
|
222 |
<li><em>...and 6 more incredible models!</em></li>
|
223 |
</ul>
|
224 |
|
225 |
+
<h3>🔑 Key Findings</h3>
|
226 |
+
<ol>
|
227 |
+
<li><strong>Outstanding Speech Quality</strong><br>
|
228 |
+
Several models—namely <strong>Kokoro-82M</strong>, <strong>csm-1b</strong>, <strong>Spark-TTS-0.5B</strong>,
|
229 |
+
<strong>Orpheus-3b-0.1-ft</strong>, <strong>F5-TTS</strong>, and <strong>Llasa-3B</strong>—delivered exceptionally
|
230 |
+
natural, clear, and realistic synthesized speech. Among these, <strong>csm-1b</strong> and <strong>F5-TTS</strong>
|
231 |
+
stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
|
232 |
+
</li>
|
233 |
+
<li><strong>Superior Controllability</strong><br>
|
234 |
+
<strong>Zonos-v0.1-transformer</strong> emerged as the leader in fine-grained control: it offers detailed
|
235 |
+
adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise
|
236 |
+
voice modulation.
|
237 |
+
</li>
|
238 |
+
<li><strong>Performance vs. Footprint Trade-off</strong><br>
|
239 |
+
Smaller models (e.g., <strong>Kokoro-82M</strong> at 82 million parameters) can still achieve “Good” or
|
240 |
+
“Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical.
|
241 |
+
Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual
|
242 |
+
synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
|
243 |
+
</li>
|
244 |
+
<li><strong>Special Notes on Multilingual & Cloning Capabilities</strong><br>
|
245 |
+
<strong>Spark-TTS-0.5B</strong> and <strong>XTTS-v2</strong> excel at cross-lingual and zero-shot voice
|
246 |
+
cloning, making them strong candidates for projects that need multi-language support or short-clip cloning.
|
247 |
+
<strong>Llama-OuteTTS-1.0-1B</strong> and <strong>MegaTTS3</strong> also offer multilingual input handling,
|
248 |
+
though they may require careful sampling parameter tuning to achieve optimal results.
|
249 |
+
</li>
|
250 |
+
</ol>
|
251 |
+
|
252 |
</div>
|
253 |
""")
|
254 |
|