Davidqian123 commited on
Commit
b544b79
·
verified ·
1 Parent(s): 2fcf31e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -11
README.md CHANGED
@@ -1,24 +1,26 @@
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
4
- - GGUF
5
  - deepseek
6
  - qwen
7
  - qwen2
8
  - transformers
9
- base_model:
10
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
11
  ---
12
 
13
  # DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
14
 
15
- ## Introduction
16
- **DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant** is a ... (TODO)
17
 
18
- ---
 
 
 
 
19
 
20
- ## How to Use on Your Device
21
- Below, we outline multiple ways to run the model locally.
22
 
23
  #### Option 1: Using Nexa SDK
24
 
@@ -56,7 +58,7 @@ Get the latest version from the [official website](https://lmstudio.ai/).
56
 
57
  **Step 2: Load and Run the Model**
58
 
59
- 2. In LM Studio's top panel, search for and select `NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant`.
60
- 3. Click `Download` (if not already downloaded) and wait for the model to load.
61
- 4. Once loaded, go to the chat window and start a conversation.
62
  ---
 
1
  ---
2
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
3
+ library_name: transformers
4
  license: apache-2.0
5
  tags:
 
6
  - deepseek
7
  - qwen
8
  - qwen2
9
  - transformers
10
+ - GGUF
 
11
  ---
12
 
13
  # DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
14
 
15
+ ## Background + Overview
 
16
 
17
+ DeepSeek-R1 has been making headlines for rivaling OpenAI’s O1 reasoning model while remaining fully open-source. Many users want to run it locally to ensure data privacy, reduce latency, and maintain offline access. However, fitting such a large model onto personal devices typically requires quantization (e.g. Q4_K_M), which often sacrifices accuracy (up to ~22% accuracy loss) and undermines the benefits of the local reasoning model.
18
+
19
+ We’ve solved the trade-off by quantizing the DeepSeek R1 Distilled model to one-fourth its original size—without losing any accuracy. This lets you run powerful on-device reasoning wherever you are, with no compromises. Tests on an **HP Omnibook AIPC** with an **AMD Ryzen™ AI 9 HX 370 processor** showed a decoding speed of **66.40 tokens per second** and a peak RAM usage of just **1228 MB** in NexaQuant version—compared to only **25.28 tokens** per second and **3788 MB RAM** in the unquantized version—while **maintaining full precision model accuracy.**
20
+
21
+ ## How to run locally
22
 
23
+ NexaQuant is compatible with **Nexa-SDK**, **Ollama**, **LM Studio**, **Llama.cpp**, and any llama.cpp based project. Below, we outline multiple ways to run the model locally.
 
24
 
25
  #### Option 1: Using Nexa SDK
26
 
 
58
 
59
  **Step 2: Load and Run the Model**
60
 
61
+ 1. In LM Studio's top panel, search for and select `NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant`.
62
+ 2. Click `Download` (if not already downloaded) and wait for the model to load.
63
+ 3. Once loaded, go to the chat window and start a conversation.
64
  ---