MLA is not supported with moe_wna16 quantization. Disabling MLA.
2
#7 opened about 5 hours ago
by
AMOSE
skips the thinking process
6
#5 opened 4 days ago
by
muzizon
The inference performance of the DeepSeek-R1-AWQ model is weak compared to the DeepSeek-R1 model
8
#3 opened 6 days ago
by
qingqingz916
Deployment framework
24
#2 opened 19 days ago
by
xro7