when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?
#28 opened about 11 hours ago
by
leonzy04
如何同时处理多个http请求
1
#27 opened about 14 hours ago
by
007hao
IQ1_S模型合并后部署于ollama上,推理生成效果差
#26 opened about 18 hours ago
by
gaozj
模型似乎被微调过
1
#25 opened 1 day ago
by
mogazheng
I tested dynamic 1.58bit and 2.22bit, All thoughts are empty?
3
#24 opened 1 day ago
by
SongXiaoMao
![](https://cdn-avatars.huggingface.co/v1/production/uploads/646315dd77f63463fc622588/dUw1xIWuiW_j-_HkiY-9V.jpeg)
What is the base precision type(FP32/FP16) used in Q2/Q1 quantization?
#23 opened 3 days ago
by
ArYuZzz1
any benchmark results?
2
#22 opened 4 days ago
by
Wei-Wu
Accuracy of the dynamic quants compared to usual quants?
12
#21 opened 4 days ago
by
inputout
![](https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/VpT4xfZEIJSc6jlmrO1Rj.png)
8bits quantization
4
#20 opened 5 days ago
by
ramkumarkoppu
New research paper, R1 type reasoning models can be drastically improved in quality
2
#19 opened 8 days ago
by
krustik
md5 / sha256 hashes please
1
#18 opened 10 days ago
by
ivanvolosyuk
Is there a model removing non-shared MoE experts?
4
#17 opened 11 days ago
by
ghostplant
A Step-by-step deployment guide with ollama
3
#16 opened 12 days ago
by
snowkylin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/630bf18509eceb8fafe7ae0e/bE4z76vTBGW3-fZtZnQlB.jpeg)
No think tokens visible
5
#15 opened 12 days ago
by
sudkamath
Over 2 tok/sec agg backed by NVMe SSD on 96GB RAM + 24GB VRAM AM5 rig with llama.cpp
9
#13 opened 13 days ago
by
ubergarm
Running the model with vLLM does not actually work
8
#12 opened 13 days ago
by
aikitoria
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6576828eb951d40e7a74985a/9XzlkFzdEggP2L1VQit9S.png)
DeepSeek-R1-GGUF on LMStudio not available
2
#11 opened 13 days ago
by
32SkyDive
Where did the BF16 come from?
8
#10 opened 13 days ago
by
gshpychka
Inference speed
2
#9 opened 14 days ago
by
Iker
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1632247447995-noauth.jpeg)
Running this model using vLLM Docker
2
#8 opened 14 days ago
by
moficodes
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65151150e62687e14a050a4a/s19zigsmHLKaFXeY3B7-J.png)
UD-IQ1_M models for distilled R1 versions?
3
#6 opened 14 days ago
by
SamPurkis
Llama.cpp server chat template
3
#4 opened 17 days ago
by
softwareweaver
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63038422a362e7e8b5196efb/vfFFZosz3-f_zp3stBnEV.png)
Are the Q4 and Q5 models R1 or R1-Zero
18
#2 opened 22 days ago
by
gng2info
What is the VRAM requirement to run this ?
5
#1 opened 22 days ago
by
RageshAntony