shubhrapandit commited on
Commit
4d00594
·
verified ·
1 Parent(s): 419e580

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -226,11 +226,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
226
  <th>Model</th>
227
  <th>Average Cost Reduction</th>
228
  <th>Latency (s)</th>
229
- <th>QPD</th>
230
  <th>Latency (s)th>
231
- <th>QPD</th>
232
  <th>Latency (s)</th>
233
- <th>QPD</th>
234
  </tr>
235
  </thead>
236
  <tbody style="text-align: center">
@@ -329,7 +329,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
329
  </tbody>
330
  </table>
331
 
 
332
 
 
333
 
334
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
335
 
@@ -348,11 +350,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
348
  <th>Model</th>
349
  <th>Average Cost Reduction</th>
350
  <th>Maximum throughput (QPS)</th>
351
- <th>QPD</th>
352
  <th>Maximum throughput (QPS)</th>
353
- <th>QPD</th>
354
  <th>Maximum throughput (QPS)</th>
355
- <th>QPD</th>
356
  </tr>
357
  </thead>
358
  <tbody style="text-align: center">
@@ -450,4 +452,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
450
  <td>4838</td>
451
  </tr>
452
  </tbody>
453
- </table>
 
 
 
 
 
 
 
226
  <th>Model</th>
227
  <th>Average Cost Reduction</th>
228
  <th>Latency (s)</th>
229
+ <th>Queries Per Dollar</th>
230
  <th>Latency (s)th>
231
+ <th>Queries Per Dollar</th>
232
  <th>Latency (s)</th>
233
+ <th>Queries Per Dollar</th>
234
  </tr>
235
  </thead>
236
  <tbody style="text-align: center">
 
329
  </tbody>
330
  </table>
331
 
332
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
333
 
334
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
335
 
336
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
337
 
 
350
  <th>Model</th>
351
  <th>Average Cost Reduction</th>
352
  <th>Maximum throughput (QPS)</th>
353
+ <th>Queries Per Dollar</th>
354
  <th>Maximum throughput (QPS)</th>
355
+ <th>Queries Per Dollar</th>
356
  <th>Maximum throughput (QPS)</th>
357
+ <th>Queries Per Dollar</th>
358
  </tr>
359
  </thead>
360
  <tbody style="text-align: center">
 
452
  <td>4838</td>
453
  </tr>
454
  </tbody>
455
+ </table>
456
+
457
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
458
+
459
+ **QPS: Queries per second.
460
+
461
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).