shubhrapandit commited on
Commit
66d9c70
·
verified ·
1 Parent(s): ab5aa65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -163,11 +163,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
163
  <th>Model</th>
164
  <th>Average Cost Reduction</th>
165
  <th>Latency (s)</th>
166
- <th>QPD</th>
167
  <th>Latency (s)th>
168
- <th>QPD</th>
169
  <th>Latency (s)</th>
170
- <th>QPD</th>
171
  </tr>
172
  </thead>
173
  <tbody style="text-align: center">
@@ -236,7 +236,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
236
  </tbody>
237
  </table>
238
 
 
239
 
 
240
 
241
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
242
 
@@ -255,11 +257,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
255
  <th>Model</th>
256
  <th>Average Cost Reduction</th>
257
  <th>Maximum throughput (QPS)</th>
258
- <th>QPD</th>
259
  <th>Maximum throughput (QPS)</th>
260
- <th>QPD</th>
261
  <th>Maximum throughput (QPS)</th>
262
- <th>QPD</th>
263
  </tr>
264
  </thead>
265
  <tbody style="text-align: center">
@@ -327,3 +329,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
327
  </tr>
328
  </tbody>
329
  </table>
 
 
 
 
 
 
 
163
  <th>Model</th>
164
  <th>Average Cost Reduction</th>
165
  <th>Latency (s)</th>
166
+ <th>Queries Per Dollar</th>
167
  <th>Latency (s)th>
168
+ <th>Queries Per Dollar</th>
169
  <th>Latency (s)</th>
170
+ <th>Queries Per Dollar</th>
171
  </tr>
172
  </thead>
173
  <tbody style="text-align: center">
 
236
  </tbody>
237
  </table>
238
 
239
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
240
 
241
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
242
 
243
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
244
 
 
257
  <th>Model</th>
258
  <th>Average Cost Reduction</th>
259
  <th>Maximum throughput (QPS)</th>
260
+ <th>Queries Per Dollar</th>
261
  <th>Maximum throughput (QPS)</th>
262
+ <th>Queries Per Dollar</th>
263
  <th>Maximum throughput (QPS)</th>
264
+ <th>Queries Per Dollar</th>
265
  </tr>
266
  </thead>
267
  <tbody style="text-align: center">
 
329
  </tr>
330
  </tbody>
331
  </table>
332
+
333
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
334
+
335
+ **QPS: Queries per second.
336
+
337
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).