unsloth
/

DeepSeek-R1-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Resources

View closed (3)

when using with ollama, does it support kv_cache_type=q4_0 and flash_attention=1?

#28 opened about 11 hours ago by

如何同时处理多个http请求

#27 opened about 14 hours ago by

IQ1_S模型合并后部署于ollama上，推理生成效果差

#26 opened about 18 hours ago by

模型似乎被微调过

#25 opened 1 day ago by

I tested dynamic 1.58bit and 2.22bit, All thoughts are empty?

#24 opened 1 day ago by

What is the base precision type(FP32/FP16) used in Q2/Q1 quantization?

#23 opened 3 days ago by

any benchmark results?

#22 opened 4 days ago by

Accuracy of the dynamic quants compared to usual quants?

#21 opened 4 days ago by

8bits quantization

#20 opened 5 days ago by

New research paper, R1 type reasoning models can be drastically improved in quality

#19 opened 8 days ago by

md5 / sha256 hashes please

#18 opened 10 days ago by

Is there a model removing non-shared MoE experts?

#17 opened 11 days ago by

A Step-by-step deployment guide with ollama

#16 opened 12 days ago by

No think tokens visible

#15 opened 12 days ago by

Over 2 tok/sec agg backed by NVMe SSD on 96GB RAM + 24GB VRAM AM5 rig with llama.cpp

#13 opened 13 days ago by

Running the model with vLLM does not actually work

#12 opened 13 days ago by

DeepSeek-R1-GGUF on LMStudio not available

#11 opened 13 days ago by

Where did the BF16 come from?

#10 opened 13 days ago by

Inference speed

#9 opened 14 days ago by

Running this model using vLLM Docker

#8 opened 14 days ago by

UD-IQ1_M models for distilled R1 versions?

#6 opened 14 days ago by

Llama.cpp server chat template

#4 opened 17 days ago by

Are the Q4 and Q5 models R1 or R1-Zero

#2 opened 22 days ago by

What is the VRAM requirement to run this ?

#1 opened 22 days ago by