OPEA
/

DeepSeek-V3-int4-sym-awq-inc

4-bit precision

Model card Files Files and versions Community

cicdatopea commited on 10 days ago

Commit

fcb46ec

·

verified ·

1 Parent(s): e70fa45

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Please follow the license of the original model.
 **INT4 Inference on CUDA**
-For CUDA inference, we recommend using the moe_wna16 kernel in vLLM, which supports the BF16 compute dtype to prevent overflow issues. However, due to limited resources, we have not been able to test it ourselves. For more details, you may refer to models from other teams, such as [cognitivecomputation/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ), or simply use their model.
 **INT4 Inference on CPU**

 **INT4 Inference on CUDA**
+For CUDA inference, due to limited resources, we have not been able to test it ourselves. For more details, you may refer to models from other teams, such as [cognitivecomputation/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ), or simply use their model.
 **INT4 Inference on CPU**