cicdatopea commited on
Commit
fcb46ec
·
verified ·
1 Parent(s): e70fa45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ Please follow the license of the original model.
17
 
18
  **INT4 Inference on CUDA**
19
 
20
- For CUDA inference, we recommend using the moe_wna16 kernel in vLLM, which supports the BF16 compute dtype to prevent overflow issues. However, due to limited resources, we have not been able to test it ourselves. For more details, you may refer to models from other teams, such as [cognitivecomputation/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ), or simply use their model.
21
 
22
  **INT4 Inference on CPU**
23
 
 
17
 
18
  **INT4 Inference on CUDA**
19
 
20
+ For CUDA inference, due to limited resources, we have not been able to test it ourselves. For more details, you may refer to models from other teams, such as [cognitivecomputation/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ), or simply use their model.
21
 
22
  **INT4 Inference on CPU**
23