Update README.md
Browse files
README.md
CHANGED
@@ -233,11 +233,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
233 |
<th>Model</th>
|
234 |
<th>Average Cost Reduction</th>
|
235 |
<th>Latency (s)</th>
|
236 |
-
<th>
|
237 |
<th>Latency (s)th>
|
238 |
-
<th>
|
239 |
<th>Latency (s)</th>
|
240 |
-
<th>
|
241 |
</tr>
|
242 |
</thead>
|
243 |
<tbody style="text-align: center">
|
@@ -306,7 +306,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
306 |
</tbody>
|
307 |
</table>
|
308 |
|
|
|
309 |
|
|
|
310 |
|
311 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
312 |
|
@@ -325,11 +327,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
325 |
<th>Model</th>
|
326 |
<th>Average Cost Reduction</th>
|
327 |
<th>Maximum throughput (QPS)</th>
|
328 |
-
<th>
|
329 |
<th>Maximum throughput (QPS)</th>
|
330 |
-
<th>
|
331 |
<th>Maximum throughput (QPS)</th>
|
332 |
-
<th>
|
333 |
</tr>
|
334 |
</thead>
|
335 |
<tbody style="text-align: center">
|
@@ -396,4 +398,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
396 |
<td>4573</td>
|
397 |
</tr>
|
398 |
</tbody>
|
399 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
233 |
<th>Model</th>
|
234 |
<th>Average Cost Reduction</th>
|
235 |
<th>Latency (s)</th>
|
236 |
+
<th>Queries Per Dollar</th>
|
237 |
<th>Latency (s)th>
|
238 |
+
<th>Queries Per Dollar</th>
|
239 |
<th>Latency (s)</th>
|
240 |
+
<th>Queries Per Dollar</th>
|
241 |
</tr>
|
242 |
</thead>
|
243 |
<tbody style="text-align: center">
|
|
|
306 |
</tbody>
|
307 |
</table>
|
308 |
|
309 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
310 |
|
311 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
312 |
|
313 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
314 |
|
|
|
327 |
<th>Model</th>
|
328 |
<th>Average Cost Reduction</th>
|
329 |
<th>Maximum throughput (QPS)</th>
|
330 |
+
<th>Queries Per Dollar</th>
|
331 |
<th>Maximum throughput (QPS)</th>
|
332 |
+
<th>Queries Per Dollar</th>
|
333 |
<th>Maximum throughput (QPS)</th>
|
334 |
+
<th>Queries Per Dollar</th>
|
335 |
</tr>
|
336 |
</thead>
|
337 |
<tbody style="text-align: center">
|
|
|
398 |
<td>4573</td>
|
399 |
</tr>
|
400 |
</tbody>
|
401 |
+
</table>
|
402 |
+
|
403 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
404 |
+
|
405 |
+
**QPS: Queries per second.
|
406 |
+
|
407 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|