metadata
base_model:
- deepseek-ai/DeepSeek-Coder-V2-Instruct
Custom quantizations of deepseek-coder-v2-instruct optimized for cpu inference.
Theis ones uses GGML TYPE IQ_4_XS in combination with q8_0 so it runs fast with minimal loss and takes advantage of int8 optimizations on most nevwer server cpus.
While it requiremed custom code to make it is standard compatible with plain llama.cpp
The following 4bit version is the one I use myself, it gets 17tps on 64 arm cores You don't need to consolidates the files anymore, just point llama-cli to the first one and it'll handle the rest fine Then to run just do
./llama-cli --temp 0.4 -m deepseek_coder_v2_cpu_iq4xm.gguf-00001-of-00004.gguf -c 32000 -co -cnv -i -f bigprompt.txt
deepseek_coder_v2_cpu_iq4xm.gguf-00001-of-00004.gguf
deepseek_coder_v2_cpu_iq4xm.gguf-00002-of-00004.gguf
deepseek_coder_v2_cpu_iq4xm.gguf-00003-of-00004.gguf
deepseek_coder_v2_cpu_iq4xm.gguf-00004-of-00004.gguf
To download the 4bit version much faster on linux apt install aria2, on mac: brew install aria2
sudo apt install -y aria2
aria2c -x 8 -o deepseek_coder_v2_cpu_iq4xm.gguf-00001-of-00004.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_iq4xm.gguf-00001-of-00004.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_iq4xm.gguf-00002-of-00004.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_iq4xm.gguf-00002-of-00004.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_iq4xm.gguf-00003-of-00004.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_iq4xm.gguf-00003-of-00004.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_iq4xm.gguf-00004-of-00004.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_iq4xm.gguf-00004-of-00004.gguf
And for downloading the Q8_0 version converted in the most lossless way possible from hf bf16 model
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00001-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00001-of-00006.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00002-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00002-of-00006.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00003-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00003-of-00006.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00004-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00004-of-00006.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00005-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00005-of-00006.gguf
aria2c -x 8 -o deepseek_coder_v2_cpu_q8_0-00006-of-00006.gguf \
https://huggingface.co/nisten/deepseek-coder-v2-inst-cpu-optimized-gguf/resolve/main/deepseek_coder_v2_cpu_q8_0-00006-of-00006.gguf
Enjoy and remembeter to accelerate!
-Nisten