Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,6 @@ AWQ of the DeepSeek R1 model.
|
|
13 |
|
14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
15 |
|
16 |
-
Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.
|
|
|
|
|
|
13 |
|
14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
15 |
|
16 |
+
Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.
|
17 |
+
|
18 |
+
If you are serving with vLLM, please either add `--dtype float16` or use the new `moe_wna16` kernel by using `--quantization moe_wna16`.
|