Update README.md
Browse files
README.md
CHANGED
@@ -226,11 +226,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
226 |
<th>Model</th>
|
227 |
<th>Average Cost Reduction</th>
|
228 |
<th>Latency (s)</th>
|
229 |
-
<th>
|
230 |
<th>Latency (s)th>
|
231 |
-
<th>
|
232 |
<th>Latency (s)</th>
|
233 |
-
<th>
|
234 |
</tr>
|
235 |
</thead>
|
236 |
<tbody style="text-align: center">
|
@@ -329,7 +329,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
329 |
</tbody>
|
330 |
</table>
|
331 |
|
|
|
332 |
|
|
|
333 |
|
334 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
335 |
|
@@ -348,11 +350,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
348 |
<th>Model</th>
|
349 |
<th>Average Cost Reduction</th>
|
350 |
<th>Maximum throughput (QPS)</th>
|
351 |
-
<th>
|
352 |
<th>Maximum throughput (QPS)</th>
|
353 |
-
<th>
|
354 |
<th>Maximum throughput (QPS)</th>
|
355 |
-
<th>
|
356 |
</tr>
|
357 |
</thead>
|
358 |
<tbody style="text-align: center">
|
@@ -450,4 +452,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
450 |
<td>4838</td>
|
451 |
</tr>
|
452 |
</tbody>
|
453 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
<th>Model</th>
|
227 |
<th>Average Cost Reduction</th>
|
228 |
<th>Latency (s)</th>
|
229 |
+
<th>Queries Per Dollar</th>
|
230 |
<th>Latency (s)th>
|
231 |
+
<th>Queries Per Dollar</th>
|
232 |
<th>Latency (s)</th>
|
233 |
+
<th>Queries Per Dollar</th>
|
234 |
</tr>
|
235 |
</thead>
|
236 |
<tbody style="text-align: center">
|
|
|
329 |
</tbody>
|
330 |
</table>
|
331 |
|
332 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
333 |
|
334 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
335 |
|
336 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
337 |
|
|
|
350 |
<th>Model</th>
|
351 |
<th>Average Cost Reduction</th>
|
352 |
<th>Maximum throughput (QPS)</th>
|
353 |
+
<th>Queries Per Dollar</th>
|
354 |
<th>Maximum throughput (QPS)</th>
|
355 |
+
<th>Queries Per Dollar</th>
|
356 |
<th>Maximum throughput (QPS)</th>
|
357 |
+
<th>Queries Per Dollar</th>
|
358 |
</tr>
|
359 |
</thead>
|
360 |
<tbody style="text-align: center">
|
|
|
452 |
<td>4838</td>
|
453 |
</tr>
|
454 |
</tbody>
|
455 |
+
</table>
|
456 |
+
|
457 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
458 |
+
|
459 |
+
**QPS: Queries per second.
|
460 |
+
|
461 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|