Update README.md
Browse files
README.md
CHANGED
@@ -295,6 +295,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
|
|
295 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
296 |
<details>
|
297 |
<summary> Reproduce Model Performance Results </summary>
|
|
|
298 |
## Setup
|
299 |
|
300 |
Get vllm source code:
|
|
|
295 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
296 |
<details>
|
297 |
<summary> Reproduce Model Performance Results </summary>
|
298 |
+
|
299 |
## Setup
|
300 |
|
301 |
Get vllm source code:
|