Spaces:
Sleeping
Sleeping
File size: 7,686 Bytes
92634e9 62ae11f 92634e9 62ae11f 92634e9 c423d00 92634e9 62ae11f 92634e9 c423d00 92634e9 62ae11f 92634e9 5daaa4f 92634e9 c423d00 92634e9 5daaa4f 92634e9 62ae11f 92634e9 c423d00 5daaa4f 92634e9 62ae11f 5daaa4f c423d00 5daaa4f 92634e9 c423d00 92634e9 5daaa4f 92634e9 c423d00 62ae11f c423d00 62ae11f c423d00 62ae11f c423d00 38b51b3 c423d00 92634e9 5daaa4f 62ae11f 5daaa4f 373bffb 459d010 38b51b3 2bbadb6 373bffb 459d010 38b51b3 62ae11f 373bffb c423d00 38b51b3 62ae11f 459d010 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 f258b8d 38b51b3 c423d00 38b51b3 373bffb 38b51b3 f258b8d 5daaa4f 38b51b3 5daaa4f 92634e9 38b51b3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Speech-to-Speech Model Comparison</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<style>
body {
background-color: #f0f8ff;
font-family: 'Arial', sans-serif;
}
.container {
background-color: #fff;
border-radius: 15px;
box-shadow: 0 6px 15px rgba(0, 0, 0, 0.15);
padding: 40px;
max-width: 800px;
margin: 30px auto;
}
h3 {
font-size: 2rem;
font-weight: bold;
color: #333;
text-align: center;
margin-bottom: 20px;
}
p {
color: #555;
font-size: 1rem;
line-height: 1.8;
}
.btn {
border-radius: 25px;
font-size: 1.1rem;
padding: 12px 25px;
font-weight: bold;
transition: background-color 0.3s ease, transform 0.2s ease;
}
.btn-primary {
background-color: #007bff;
border: none;
}
.btn-primary:hover {
background-color: #0056b3;
transform: scale(1.05);
}
.icon {
color: #f39c12;
margin-right: 5px;
}
.section-title {
font-size: 1.2rem;
font-weight: bold;
color: #007bff;
display: flex;
align-items: center;
margin-top: 20px;
}
.section-title .fa {
margin-right: 10px;
}
.audio-container {
text-align: center;
margin-top: 20px;
}
.audio-container .audio-item {
display: flex;
justify-content: center;
align-items: center;
margin-bottom: 15px;
}
.audio-container .audio-item span {
margin-right: 10px;
font-weight: bold;
}
audio {
display: inline-block;
}
</style>
</head>
<body>
<div class="container py-5">
<h3><i class="fas fa-microphone-alt icon"></i>Speech-to-Speech Model Comparison</h3>
<div id="evaluation-info" class="mb-5">
<p class="text-start">
<span class="section-title"><i class="fas fa-info-circle"></i> Welcome to the Speech-to-Speech (S2S)
Model Evaluation! 🎤</span>
In this evaluation, you will assess the performance of 6 S2S models:
<strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>,
<strong>Mini-Omni</strong>, <strong>Cascade</strong>, and <strong>LLaMA-Omni</strong>.
The goal is to evaluate how well these models handle various speech tasks across different domains.
<span class="section-title"><i class="fas fa-tasks"></i> How It Works</span>
Once you select a specific domain and task (e.g., <em>Educational Tutoring</em> and <em>Rhythm
Control</em>),
you will proceed to the evaluation stage. In each round, you will be presented with an audio input. 🎵
For example:
<div class="audio-container">
<div class="audio-item">
<span>Audio Sample:</span>
<audio controls>
<source src="/static/audio/sample/input_audio.wav" type="audio/wav">
</audio>
</div>
</div>
The corresponding text is:
<em>"Say the following sentence at my speed first, then say it again very slowly:
'Artificial intelligence is changing the world in many ways.'" </em> 🧠
<small>(Note: the audio plays at 1.5x the normal speed.)</small>
<span class="section-title"><i class="fas fa-star"></i> Model Performance</span>
<div class="audio-container">
<div class="audio-item">
<span>ChatGPT-4o:</span>
<audio controls>
<source src="/static/audio/sample/4o_audio.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
missing
information.
</p>
<br>
<div class="audio-item">
<span>FunAudioLLM:</span>
<audio controls>
<source src="/static/audio/sample/FunAudio_audio.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
missing
information.
</p>
<br>
<div class="audio-item">
<span>SpeechGPT:</span>
<audio controls>
<source src="/static/audio/sample/SpeechGPT.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Partially followed the instruction, with minor semantic deviation and
missing information.
</p>
<br>
<div class="audio-item">
<span>Mini-Omni:</span>
<audio controls>
<source src="/static/audio/sample/mini-omni.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Did not follow the instruction, with significant semantic deviation
and missing information.
</p>
</div>
<p class="text-start">
After making your choice, you'll proceed to the next round. 🔄
</p>
<p class="text-start">
<strong>Click the button below to start the evaluation! 🚀</strong>
</p>
</div>
<div class="text-center">
<a href="http://71.132.14.167:6002/" target="_blank" class="btn btn-primary"><i class="fas fa-play"></i>
Start Evaluation</a>
</div>
</div>
</body>
</html> |