shubhrapandit commited on
Commit
0c4123a
·
verified ·
1 Parent(s): 67ab6e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -654,7 +654,7 @@ lm_eval \
654
  ## Inference Performance
655
 
656
 
657
- This model achieves up to 1.84x speedup in single-stream deployment and up to 1.78x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
658
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
659
 
660
  <details>
@@ -806,21 +806,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
806
  <tr>
807
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
808
  <td>1.70</td>
809
- <td>1.6</td>
810
  <td>766</td>
811
- <td>2.2</td>
812
  <td>1142</td>
813
- <td>2.6</td>
814
  <td>1348</td>
815
  </tr>
816
  <tr>
817
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
818
  <td>1.48</td>
819
- <td>1.0</td>
820
  <td>552</td>
821
- <td>2.0</td>
822
  <td>1010</td>
823
- <td>2.8</td>
824
  <td>1360</td>
825
  </tr>
826
  <tr>
@@ -837,21 +837,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
837
  <tr>
838
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
839
  <td>1.61</td>
840
- <td>3.4</td>
841
  <td>905</td>
842
- <td>5.2</td>
843
  <td>1406</td>
844
- <td>6.4</td>
845
  <td>1759</td>
846
  </tr>
847
  <tr>
848
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
849
  <td>1.33</td>
850
- <td>2.8</td>
851
  <td>761</td>
852
- <td>4.4</td>
853
  <td>1228</td>
854
- <td>5.4</td>
855
  <td>1480</td>
856
  </tr>
857
  </tbody>
 
654
  ## Inference Performance
655
 
656
 
657
+ This model achieves up to 1.9x speedup in single-stream deployment and up to 1.78x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
658
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
659
 
660
  <details>
 
806
  <tr>
807
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
808
  <td>1.70</td>
809
+ <td>0.8</td>
810
  <td>766</td>
811
+ <td>1.1</td>
812
  <td>1142</td>
813
+ <td>1.3</td>
814
  <td>1348</td>
815
  </tr>
816
  <tr>
817
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
818
  <td>1.48</td>
819
+ <td>0.5</td>
820
  <td>552</td>
821
+ <td>1.0</td>
822
  <td>1010</td>
823
+ <td>1.4</td>
824
  <td>1360</td>
825
  </tr>
826
  <tr>
 
837
  <tr>
838
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
839
  <td>1.61</td>
840
+ <td>1.7</td>
841
  <td>905</td>
842
+ <td>2.6</td>
843
  <td>1406</td>
844
+ <td>3.2</td>
845
  <td>1759</td>
846
  </tr>
847
  <tr>
848
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
849
  <td>1.33</td>
850
+ <td>1.4</td>
851
  <td>761</td>
852
+ <td>2.2</td>
853
  <td>1228</td>
854
+ <td>2.7</td>
855
  <td>1480</td>
856
  </tr>
857
  </tbody>