File size: 1,449 Bytes
b7ed62f
26950ec
 
 
 
 
 
 
 
 
19aade4
26950ec
 
 
 
b7ed62f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
This is the [Deepseek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) model, convert to OpenVINO with INT4 weight compression. This model is optimized for CPU and GPU. See [helenai/DeepSeek-R1-Distill-Qwen-7B-ov-int4-npu](https://huggingface.co/helenai/DeepSeek-R1-Distill-Qwen-7B-ov-int4-npu) for a version that works on NPU.

To run inference on this model, install openvino-genai (`pip install openvino-genai`) and run [llm_chat_deepseek.py(https://gist.github.com/helena-intel/554fba91f380df590ecc9245abdad33f) 

Step-by-step instructions for best results:

```
pip install --pre --upgrade openvino openvino-genai openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install huggingface-hub
huggingface-cli download helenai/DeepSeek-R1-Distill-Qwen-7B-ov-int4 --local-dir DeepSeek-R1-Distill-Qwen-7B-ov-int4
curl -O https://gist.githubusercontent.com/helena-intel/554fba91f380df590ecc9245abdad33f/raw/04f495164482823aa7e6ba1119a5c43e275d08f5/llm_chat_deepseek.py
python llm_chat_deepseek.py DeepSeek-R1-Distill-Qwen-7B-ov-int4 GPU
```

> [!NOTE]
> The last line specifies the device to run inference. GPU is recommended for recent Intel laptops with integrated graphics, or for Intel discrete graphics. Change to CPU if you do not have an Intel GPU.

Gradio chatbot notebook using this model: https://gist.github.com/helena-intel/69e1c2921a2bcb618fdd7cdfb0bd0202