shubhrapandit commited on
Commit
487503b
·
verified ·
1 Parent(s): 3e83bce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -233,11 +233,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
233
  <th>Model</th>
234
  <th>Average Cost Reduction</th>
235
  <th>Latency (s)</th>
236
- <th>QPD</th>
237
  <th>Latency (s)th>
238
- <th>QPD</th>
239
  <th>Latency (s)</th>
240
- <th>QPD</th>
241
  </tr>
242
  </thead>
243
  <tbody style="text-align: center">
@@ -306,7 +306,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
306
  </tbody>
307
  </table>
308
 
 
309
 
 
310
 
311
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
312
 
@@ -325,11 +327,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
325
  <th>Model</th>
326
  <th>Average Cost Reduction</th>
327
  <th>Maximum throughput (QPS)</th>
328
- <th>QPD</th>
329
  <th>Maximum throughput (QPS)</th>
330
- <th>QPD</th>
331
  <th>Maximum throughput (QPS)</th>
332
- <th>QPD</th>
333
  </tr>
334
  </thead>
335
  <tbody style="text-align: center">
@@ -396,4 +398,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
396
  <td>4573</td>
397
  </tr>
398
  </tbody>
399
- </table>
 
 
 
 
 
 
 
233
  <th>Model</th>
234
  <th>Average Cost Reduction</th>
235
  <th>Latency (s)</th>
236
+ <th>Queries Per Dollar</th>
237
  <th>Latency (s)th>
238
+ <th>Queries Per Dollar</th>
239
  <th>Latency (s)</th>
240
+ <th>Queries Per Dollar</th>
241
  </tr>
242
  </thead>
243
  <tbody style="text-align: center">
 
306
  </tbody>
307
  </table>
308
 
309
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
310
 
311
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
312
 
313
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
314
 
 
327
  <th>Model</th>
328
  <th>Average Cost Reduction</th>
329
  <th>Maximum throughput (QPS)</th>
330
+ <th>Queries Per Dollar</th>
331
  <th>Maximum throughput (QPS)</th>
332
+ <th>Queries Per Dollar</th>
333
  <th>Maximum throughput (QPS)</th>
334
+ <th>Queries Per Dollar</th>
335
  </tr>
336
  </thead>
337
  <tbody style="text-align: center">
 
398
  <td>4573</td>
399
  </tr>
400
  </tbody>
401
+ </table>
402
+
403
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
404
+
405
+ **QPS: Queries per second.
406
+
407
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).