File size: 7,499 Bytes

---
library_name: transformers
license: bsd-3-clause
base_model:
- jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4
tags:
- DeepSeek
- DeepSeek-R1-Distill-Qwen-1.5B
- DeepSeek-R1-Distill-Qwen-1.5B-GPTQ-Int4
- GPTQ
- Int4
---

# DeepSeek-R1-Distill-Qwen-1.5B-GPTQ-Int4

This version of DeepSeek-R1-Distill-Qwen-1.5B has been converted to run on the Axera NPU using **w4a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 3.4(Not released yet)

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4

[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html) 

[AXera NPU LLM Runtime](https://github.com/AXERA-TECH/ax-llm) 

## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
  - *developing*
 
|Chips|w8a16|w4a16|
|--|--|--|
|AX650| 11 tokens/sec|19 tokens/sec|

## How to use

Download all files from this repository to the device

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# tree -L 1
.
├── deepseek-r1-1.5b-gptq-int4-ax650
├── deepseek-r1_tokenizer
├── deepseek-r1_tokenizer.py
├── main_axcl_aarch64
├── main_axcl_x86
├── main_prefill
├── post_config.json
├── run_deepseek-r1_1.5b_gptq_int4_ax650.sh
├── run_deepseek-r1_1.5b_gptq_int4_axcl_aarch64.sh
└── run_deepseek-r1_1.5b_gptq_int4_axcl_x86.sh
```

#### Install transformer

```
pip install transformers==4.41.1
```

#### Start the Tokenizer service

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# python3 deepseek-r1_tokenizer.py --port 12345
151646 <｜begin▁of▁sentence｜> 151643 <｜end▁of▁sentence｜>
<｜begin▁of▁sentence｜>You are DeepSeek-R1, You are a helpful assistant.<｜User｜>hello world<｜Assistant｜>
[151646, 151646, 2610, 525, 18183, 39350, 10911, 16, 11, 1446, 525, 264, 10950, 17847, 13, 151644, 14990, 1879, 151645]
http://localhost:12345
```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Open another terminal and run `run_deepseek-r1_1.5b_gptq_int4_ax650.sh`

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# ./run_deepseek-r1_1.5b_gptq_int4_ax650.sh
[I][                            Init][ 125]: LLM init start
bos_id: 151646, eos_id: 151643
100% | ████████████████████████████████ |  31 /  31 [1.62s<1.62s, 19.14 count/s] init post axmodel ok,remain_cmm(2731 MB)
[I][                            Init][ 241]: max_token_len : 1023
[I][                            Init][ 246]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 254]: prefill_token_num : 128
[I][                     load_config][ 281]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 268]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>> who are you
[I][                             Run][ 466]: ttft: 281.62 ms
<think>
Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek.
I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.
</think>

Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek.
I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.

[N][                             Run][ 605]: hit eos,avg 18.17 token/s

>> 
```

#### Inference with M.2 Accelerator card

[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.

```
(base) axera@raspberrypi:~/samples/deepseek-r1-1.5b $ ./run_deepseek-r1_1.5b_gptq_int4_axcl_aarch64.sh
build time: Feb 13 2025 15:44:57
[I][                            Init][ 111]: LLM init start
bos_id: 151646, eos_id: 151643
100% | ████████████████████████████████ |  31 /  31 [22.80s<22.80s, 1.36 count/s] init post axmodel okremain_cmm(6219 MB)
[I][                            Init][ 226]: max_token_len : 1023
[I][                            Init][ 231]: kv_cache_size : 256, kv_cache_num: 1023
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 288]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

>> who are you
<think>
I'm DeepSeek-R1, an AI assistant created exclusively by DeepSeek. I specialize in helping you answer complex questions,
providing detailed solutions, and strong recommendations tailored to meet specific needs.
For comprehensive details about our models and products, we invite you to explore our official website.
</think>

I'm DeepSeek-R1, an AI assistant created exclusively by DeepSeek. I specialize in helping you answer complex questions,
providing detailed solutions, and strong recommendations tailored to meet specific needs.
For comprehensive details about our models and products, we invite you to explore our official website.

[N][                             Run][ 610]: hit eos,avg 15.08 token/s

>> ^Cq

(base) axera@raspberrypi:~/samples/deepseek-r1-1.5b $ axcl-smi
+------------------------------------------------------------------------------------------------+
| AXCL-SMI  V2.26.0_20250205130139                                Driver  V2.26.0_20250205130139 |
+-----------------------------------------+--------------+---------------------------------------+
| Card  Name                     Firmware | Bus-Id       |                          Memory-Usage |
| Fan   Temp                Pwr:Usage/Cap | CPU      NPU |                             CMM-Usage |
|=========================================+==============+=======================================|
|    0  AX650N                    V2.26.0 | 0000:01:00.0 |                170 MiB /      945 MiB |
|   --   39C                      -- / -- | 0%        0% |               1063 MiB /     7040 MiB |
+-----------------------------------------+--------------+---------------------------------------+

+------------------------------------------------------------------------------------------------+
| Processes:                                                                                     |
| Card      PID  Process Name                                                   NPU Memory Usage |
|================================================================================================|
|    0    17325  /home/axera/samples/deepseek-r1-1.5b/main_axcl_aarch64               1037736 KiB |
+------------------------------------------------------------------------------------------------+
```