Update README.md
Browse files
README.md
CHANGED
@@ -163,11 +163,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
163 |
<th>Model</th>
|
164 |
<th>Average Cost Reduction</th>
|
165 |
<th>Latency (s)</th>
|
166 |
-
<th>
|
167 |
<th>Latency (s)th>
|
168 |
-
<th>
|
169 |
<th>Latency (s)</th>
|
170 |
-
<th>
|
171 |
</tr>
|
172 |
</thead>
|
173 |
<tbody style="text-align: center">
|
@@ -236,7 +236,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
236 |
</tbody>
|
237 |
</table>
|
238 |
|
|
|
239 |
|
|
|
240 |
|
241 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
242 |
|
@@ -255,11 +257,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
255 |
<th>Model</th>
|
256 |
<th>Average Cost Reduction</th>
|
257 |
<th>Maximum throughput (QPS)</th>
|
258 |
-
<th>
|
259 |
<th>Maximum throughput (QPS)</th>
|
260 |
-
<th>
|
261 |
<th>Maximum throughput (QPS)</th>
|
262 |
-
<th>
|
263 |
</tr>
|
264 |
</thead>
|
265 |
<tbody style="text-align: center">
|
@@ -327,3 +329,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
327 |
</tr>
|
328 |
</tbody>
|
329 |
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
<th>Model</th>
|
164 |
<th>Average Cost Reduction</th>
|
165 |
<th>Latency (s)</th>
|
166 |
+
<th>Queries Per Dollar</th>
|
167 |
<th>Latency (s)th>
|
168 |
+
<th>Queries Per Dollar</th>
|
169 |
<th>Latency (s)</th>
|
170 |
+
<th>Queries Per Dollar</th>
|
171 |
</tr>
|
172 |
</thead>
|
173 |
<tbody style="text-align: center">
|
|
|
236 |
</tbody>
|
237 |
</table>
|
238 |
|
239 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
240 |
|
241 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
242 |
|
243 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
244 |
|
|
|
257 |
<th>Model</th>
|
258 |
<th>Average Cost Reduction</th>
|
259 |
<th>Maximum throughput (QPS)</th>
|
260 |
+
<th>Queries Per Dollar</th>
|
261 |
<th>Maximum throughput (QPS)</th>
|
262 |
+
<th>Queries Per Dollar</th>
|
263 |
<th>Maximum throughput (QPS)</th>
|
264 |
+
<th>Queries Per Dollar</th>
|
265 |
</tr>
|
266 |
</thead>
|
267 |
<tbody style="text-align: center">
|
|
|
329 |
</tr>
|
330 |
</tbody>
|
331 |
</table>
|
332 |
+
|
333 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
334 |
+
|
335 |
+
**QPS: Queries per second.
|
336 |
+
|
337 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|