shubhrapandit commited on
Commit
9af07ec
·
verified ·
1 Parent(s): 43aa3c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -233,11 +233,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
233
  <th>Model</th>
234
  <th>Average Cost Reduction</th>
235
  <th>Latency (s)</th>
236
- <th>QPD</th>
237
  <th>Latency (s)th>
238
- <th>QPD</th>
239
  <th>Latency (s)</th>
240
- <th>QPD</th>
241
  </tr>
242
  </thead>
243
  <tbody style="text-align: center">
@@ -337,7 +337,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
337
  </tbody>
338
  </table>
339
 
 
340
 
 
341
 
342
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
343
 
@@ -356,11 +358,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
356
  <th>Model</th>
357
  <th>Average Cost Reduction</th>
358
  <th>Maximum throughput (QPS)</th>
359
- <th>QPD</th>
360
  <th>Maximum throughput (QPS)</th>
361
- <th>QPD</th>
362
  <th>Maximum throughput (QPS)</th>
363
- <th>QPD</th>
364
  </tr>
365
  </thead>
366
  <tbody style="text-align: center">
@@ -458,4 +460,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
458
  <td>4838</td>
459
  </tr>
460
  </tbody>
461
- </table>
 
 
 
 
 
 
 
233
  <th>Model</th>
234
  <th>Average Cost Reduction</th>
235
  <th>Latency (s)</th>
236
+ <th>Queries Per Dollar</th>
237
  <th>Latency (s)th>
238
+ <th>Queries Per Dollar</th>
239
  <th>Latency (s)</th>
240
+ <th>Queries Per Dollar</th>
241
  </tr>
242
  </thead>
243
  <tbody style="text-align: center">
 
337
  </tbody>
338
  </table>
339
 
340
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
341
 
342
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
343
 
344
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
345
 
 
358
  <th>Model</th>
359
  <th>Average Cost Reduction</th>
360
  <th>Maximum throughput (QPS)</th>
361
+ <th>Queries Per Dollar</th>
362
  <th>Maximum throughput (QPS)</th>
363
+ <th>Queries Per Dollar</th>
364
  <th>Maximum throughput (QPS)</th>
365
+ <th>Queries Per Dollar</th>
366
  </tr>
367
  </thead>
368
  <tbody style="text-align: center">
 
460
  <td>4838</td>
461
  </tr>
462
  </tbody>
463
+ </table>
464
+
465
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
466
+
467
+ **QPS: Queries per second.
468
+
469
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).