File size: 7,499 Bytes
936ce0f 8fbfb07 b670f94 8fbfb07 b670f94 8fbfb07 936ce0f 7cea081 936ce0f d7e913a 936ce0f 7cea081 d7e913a 7cea081 d7e913a 7cea081 dd8dc18 d7e913a 496f830 d7e913a 801ac36 d7e913a 8fbfb07 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
library_name: transformers
license: bsd-3-clause
base_model:
- jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4
tags:
- DeepSeek
- DeepSeek-R1-Distill-Qwen-1.5B
- DeepSeek-R1-Distill-Qwen-1.5B-GPTQ-Int4
- GPTQ
- Int4
---
# DeepSeek-R1-Distill-Qwen-1.5B-GPTQ-Int4
This version of DeepSeek-R1-Distill-Qwen-1.5B has been converted to run on the Axera NPU using **w4a16** quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4(Not released yet)
## Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4
[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
[AXera NPU LLM Runtime](https://github.com/AXERA-TECH/ax-llm)
## Support Platform
- AX650
- AX650N DEMO Board
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
- *developing*
|Chips|w8a16|w4a16|
|--|--|--|
|AX650| 11 tokens/sec|19 tokens/sec|
## How to use
Download all files from this repository to the device
```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# tree -L 1
.
βββ deepseek-r1-1.5b-gptq-int4-ax650
βββ deepseek-r1_tokenizer
βββ deepseek-r1_tokenizer.py
βββ main_axcl_aarch64
βββ main_axcl_x86
βββ main_prefill
βββ post_config.json
βββ run_deepseek-r1_1.5b_gptq_int4_ax650.sh
βββ run_deepseek-r1_1.5b_gptq_int4_axcl_aarch64.sh
βββ run_deepseek-r1_1.5b_gptq_int4_axcl_x86.sh
```
#### Install transformer
```
pip install transformers==4.41.1
```
#### Start the Tokenizer service
```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# python3 deepseek-r1_tokenizer.py --port 12345
151646 <ο½beginβofβsentenceο½> 151643 <ο½endβofβsentenceο½>
<ο½beginβofβsentenceο½>You are DeepSeek-R1, You are a helpful assistant.<ο½Userο½>hello world<ο½Assistantο½>
[151646, 151646, 2610, 525, 18183, 39350, 10911, 16, 11, 1446, 525, 264, 10950, 17847, 13, 151644, 14990, 1879, 151645]
http://localhost:12345
```
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
Open another terminal and run `run_deepseek-r1_1.5b_gptq_int4_ax650.sh`
```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-1.5b# ./run_deepseek-r1_1.5b_gptq_int4_ax650.sh
[I][ Init][ 125]: LLM init start
bos_id: 151646, eos_id: 151643
100% | ββββββββββββββββββββββββββββββββ | 31 / 31 [1.62s<1.62s, 19.14 count/s] init post axmodel ok,remain_cmm(2731 MB)
[I][ Init][ 241]: max_token_len : 1023
[I][ Init][ 246]: kv_cache_size : 256, kv_cache_num: 1023
[I][ Init][ 254]: prefill_token_num : 128
[I][ load_config][ 281]: load config:
{
"enable_repetition_penalty": false,
"enable_temperature": true,
"enable_top_k_sampling": true,
"enable_top_p_sampling": false,
"penalty_window": 20,
"repetition_penalty": 1.2,
"temperature": 0.9,
"top_k": 10,
"top_p": 0.8
}
[I][ Init][ 268]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>> who are you
[I][ Run][ 466]: ttft: 281.62 ms
<think>
Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek.
I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.
</think>
Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek.
I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.
[N][ Run][ 605]: hit eos,avg 18.17 token/s
>>
```
#### Inference with M.2 Accelerator card
[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
```
(base) axera@raspberrypi:~/samples/deepseek-r1-1.5b $ ./run_deepseek-r1_1.5b_gptq_int4_axcl_aarch64.sh
build time: Feb 13 2025 15:44:57
[I][ Init][ 111]: LLM init start
bos_id: 151646, eos_id: 151643
100% | ββββββββββββββββββββββββββββββββ | 31 / 31 [22.80s<22.80s, 1.36 count/s] init post axmodel okremain_cmm(6219 MB)
[I][ Init][ 226]: max_token_len : 1023
[I][ Init][ 231]: kv_cache_size : 256, kv_cache_num: 1023
[I][ load_config][ 282]: load config:
{
"enable_repetition_penalty": false,
"enable_temperature": true,
"enable_top_k_sampling": true,
"enable_top_p_sampling": false,
"penalty_window": 20,
"repetition_penalty": 1.2,
"temperature": 0.9,
"top_k": 10,
"top_p": 0.8
}
[I][ Init][ 288]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>> who are you
<think>
I'm DeepSeek-R1, an AI assistant created exclusively by DeepSeek. I specialize in helping you answer complex questions,
providing detailed solutions, and strong recommendations tailored to meet specific needs.
For comprehensive details about our models and products, we invite you to explore our official website.
</think>
I'm DeepSeek-R1, an AI assistant created exclusively by DeepSeek. I specialize in helping you answer complex questions,
providing detailed solutions, and strong recommendations tailored to meet specific needs.
For comprehensive details about our models and products, we invite you to explore our official website.
[N][ Run][ 610]: hit eos,avg 15.08 token/s
>> ^Cq
(base) axera@raspberrypi:~/samples/deepseek-r1-1.5b $ axcl-smi
+------------------------------------------------------------------------------------------------+
| AXCL-SMI V2.26.0_20250205130139 Driver V2.26.0_20250205130139 |
+-----------------------------------------+--------------+---------------------------------------+
| Card Name Firmware | Bus-Id | Memory-Usage |
| Fan Temp Pwr:Usage/Cap | CPU NPU | CMM-Usage |
|=========================================+==============+=======================================|
| 0 AX650N V2.26.0 | 0000:01:00.0 | 170 MiB / 945 MiB |
| -- 39C -- / -- | 0% 0% | 1063 MiB / 7040 MiB |
+-----------------------------------------+--------------+---------------------------------------+
+------------------------------------------------------------------------------------------------+
| Processes: |
| Card PID Process Name NPU Memory Usage |
|================================================================================================|
| 0 17325 /home/axera/samples/deepseek-r1-1.5b/main_axcl_aarch64 1037736 KiB |
+------------------------------------------------------------------------------------------------+
``` |