Update README.md
Browse files
README.md
CHANGED
@@ -233,11 +233,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
233 |
<th>Model</th>
|
234 |
<th>Average Cost Reduction</th>
|
235 |
<th>Latency (s)</th>
|
236 |
-
<th>
|
237 |
<th>Latency (s)th>
|
238 |
-
<th>
|
239 |
<th>Latency (s)</th>
|
240 |
-
<th>
|
241 |
</tr>
|
242 |
</thead>
|
243 |
<tbody style="text-align: center">
|
@@ -337,7 +337,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
337 |
</tbody>
|
338 |
</table>
|
339 |
|
|
|
340 |
|
|
|
341 |
|
342 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
343 |
|
@@ -356,11 +358,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
356 |
<th>Model</th>
|
357 |
<th>Average Cost Reduction</th>
|
358 |
<th>Maximum throughput (QPS)</th>
|
359 |
-
<th>
|
360 |
<th>Maximum throughput (QPS)</th>
|
361 |
-
<th>
|
362 |
<th>Maximum throughput (QPS)</th>
|
363 |
-
<th>
|
364 |
</tr>
|
365 |
</thead>
|
366 |
<tbody style="text-align: center">
|
@@ -458,4 +460,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
458 |
<td>4838</td>
|
459 |
</tr>
|
460 |
</tbody>
|
461 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
233 |
<th>Model</th>
|
234 |
<th>Average Cost Reduction</th>
|
235 |
<th>Latency (s)</th>
|
236 |
+
<th>Queries Per Dollar</th>
|
237 |
<th>Latency (s)th>
|
238 |
+
<th>Queries Per Dollar</th>
|
239 |
<th>Latency (s)</th>
|
240 |
+
<th>Queries Per Dollar</th>
|
241 |
</tr>
|
242 |
</thead>
|
243 |
<tbody style="text-align: center">
|
|
|
337 |
</tbody>
|
338 |
</table>
|
339 |
|
340 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
341 |
|
342 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
343 |
|
344 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
345 |
|
|
|
358 |
<th>Model</th>
|
359 |
<th>Average Cost Reduction</th>
|
360 |
<th>Maximum throughput (QPS)</th>
|
361 |
+
<th>Queries Per Dollar</th>
|
362 |
<th>Maximum throughput (QPS)</th>
|
363 |
+
<th>Queries Per Dollar</th>
|
364 |
<th>Maximum throughput (QPS)</th>
|
365 |
+
<th>Queries Per Dollar</th>
|
366 |
</tr>
|
367 |
</thead>
|
368 |
<tbody style="text-align: center">
|
|
|
460 |
<td>4838</td>
|
461 |
</tr>
|
462 |
</tbody>
|
463 |
+
</table>
|
464 |
+
|
465 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
466 |
+
|
467 |
+
**QPS: Queries per second.
|
468 |
+
|
469 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|