Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -295,6 +295,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
 <details>
 <summary> Reproduce Model Performance Results </summary>
 ## Setup
 Get vllm source code:

 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
 <details>
 <summary> Reproduce Model Performance Results </summary>
 ## Setup
 Get vllm source code: