shubhrapandit commited on
Commit
7f609fb
·
verified ·
1 Parent(s): e351f1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -334,6 +334,37 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
334
  </tr>
335
  </thead>
336
  <tbody style="text-align: center">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337
  <tr>
338
  <th rowspan="3" valign="top">A100x1</th>
339
  <th>Qwen/Qwen2.5-VL-7B-Instruct</th>
@@ -398,12 +429,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
398
  </tr>
399
  </tbody>
400
  </table>
401
-
402
  **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
403
 
404
  **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
405
 
406
-
407
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
408
 
409
  <table border="1" class="dataframe">
@@ -429,9 +459,40 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
429
  </tr>
430
  </thead>
431
  <tbody style="text-align: center">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
432
  <tr>
433
  <th rowspan="3" valign="top">A100x1</th>
434
- <th>Qwen/Qwen2.5-VL-7B-Instruct-quantized.</th>
435
  <td></td>
436
  <td>0.7</td>
437
  <td>1347</td>
 
334
  </tr>
335
  </thead>
336
  <tbody style="text-align: center">
337
+ <tr>
338
+ <th rowspan="3" valign="top">A6000x1</th>
339
+ <th>Qwen/Qwen2.5-VL-7B-Instruct</th>
340
+ <td></td>
341
+ <td>4.9</td>
342
+ <td>912</td>
343
+ <td>3.2</td>
344
+ <td>1386</td>
345
+ <td>3.1</td>
346
+ <td>1431</td>
347
+ </tr>
348
+ <tr>
349
+ <th>neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w8a8</th>
350
+ <td>1.50</td>
351
+ <td>3.6</td>
352
+ <td>1248</td>
353
+ <td>2.1</td>
354
+ <td>2163</td>
355
+ <td>2.0</td>
356
+ <td>2237</td>
357
+ </tr>
358
+ <tr>
359
+ <th>neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w4a16</th>
360
+ <td>2.05</td>
361
+ <td>3.3</td>
362
+ <td>1351</td>
363
+ <td>1.4</td>
364
+ <td>3252</td>
365
+ <td>1.4</td>
366
+ <td>3321</td>
367
+ </tr>
368
  <tr>
369
  <th rowspan="3" valign="top">A100x1</th>
370
  <th>Qwen/Qwen2.5-VL-7B-Instruct</th>
 
429
  </tr>
430
  </tbody>
431
  </table>
432
+
433
  **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
434
 
435
  **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
436
 
 
437
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
438
 
439
  <table border="1" class="dataframe">
 
459
  </tr>
460
  </thead>
461
  <tbody style="text-align: center">
462
+ <tr>
463
+ <th rowspan="3" valign="top">A6000x1</th>
464
+ <th>Qwen/Qwen2.5-VL-7B-Instruct</th>
465
+ <td></td>
466
+ <td>0.4</td>
467
+ <td>1837</td>
468
+ <td>1.5</td>
469
+ <td>6846</td>
470
+ <td>1.7</td>
471
+ <td>7638</td>
472
+ </tr>
473
+ <tr>
474
+ <th>neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w8a8</th>
475
+ <td>1.41</td>
476
+ <td>0.5</td>
477
+ <td>2297</td>
478
+ <td>2.3</td>
479
+ <td>10137</td>
480
+ <td>2.5</td>
481
+ <td>11472</td>
482
+ </tr>
483
+ <tr>
484
+ <th>neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w4a16</th>
485
+ <td>1.60</td>
486
+ <td>0.4</td>
487
+ <td>1828</td>
488
+ <td>2.7</td>
489
+ <td>12254</td>
490
+ <td>3.4</td>
491
+ <td>15477</td>
492
+ </tr>
493
  <tr>
494
  <th rowspan="3" valign="top">A100x1</th>
495
+ <th>Qwen/Qwen2.5-VL-7B-Instruct</th>
496
  <td></td>
497
  <td>0.7</td>
498
  <td>1347</td>