Update README.md
Browse files
README.md
CHANGED
@@ -1,24 +1,26 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
tags:
|
4 |
-
- GGUF
|
5 |
- deepseek
|
6 |
- qwen
|
7 |
- qwen2
|
8 |
- transformers
|
9 |
-
|
10 |
-
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
11 |
---
|
12 |
|
13 |
# DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
|
14 |
|
15 |
-
##
|
16 |
-
**DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant** is a ... (TODO)
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
Below, we outline multiple ways to run the model locally.
|
22 |
|
23 |
#### Option 1: Using Nexa SDK
|
24 |
|
@@ -56,7 +58,7 @@ Get the latest version from the [official website](https://lmstudio.ai/).
|
|
56 |
|
57 |
**Step 2: Load and Run the Model**
|
58 |
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
---
|
|
|
1 |
---
|
2 |
+
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
3 |
+
library_name: transformers
|
4 |
license: apache-2.0
|
5 |
tags:
|
|
|
6 |
- deepseek
|
7 |
- qwen
|
8 |
- qwen2
|
9 |
- transformers
|
10 |
+
- GGUF
|
|
|
11 |
---
|
12 |
|
13 |
# DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
|
14 |
|
15 |
+
## Background + Overview
|
|
|
16 |
|
17 |
+
DeepSeek-R1 has been making headlines for rivaling OpenAI’s O1 reasoning model while remaining fully open-source. Many users want to run it locally to ensure data privacy, reduce latency, and maintain offline access. However, fitting such a large model onto personal devices typically requires quantization (e.g. Q4_K_M), which often sacrifices accuracy (up to ~22% accuracy loss) and undermines the benefits of the local reasoning model.
|
18 |
+
|
19 |
+
We’ve solved the trade-off by quantizing the DeepSeek R1 Distilled model to one-fourth its original size—without losing any accuracy. This lets you run powerful on-device reasoning wherever you are, with no compromises. Tests on an **HP Omnibook AIPC** with an **AMD Ryzen™ AI 9 HX 370 processor** showed a decoding speed of **66.40 tokens per second** and a peak RAM usage of just **1228 MB** in NexaQuant version—compared to only **25.28 tokens** per second and **3788 MB RAM** in the unquantized version—while **maintaining full precision model accuracy.**
|
20 |
+
|
21 |
+
## How to run locally
|
22 |
|
23 |
+
NexaQuant is compatible with **Nexa-SDK**, **Ollama**, **LM Studio**, **Llama.cpp**, and any llama.cpp based project. Below, we outline multiple ways to run the model locally.
|
|
|
24 |
|
25 |
#### Option 1: Using Nexa SDK
|
26 |
|
|
|
58 |
|
59 |
**Step 2: Load and Run the Model**
|
60 |
|
61 |
+
1. In LM Studio's top panel, search for and select `NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant`.
|
62 |
+
2. Click `Download` (if not already downloaded) and wait for the model to load.
|
63 |
+
3. Once loaded, go to the chat window and start a conversation.
|
64 |
---
|