NexaAIDev
/

DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

Model card Files Files and versions Community

Davidqian123 commited on Feb 6

Commit

b544b79

·

verified ·

1 Parent(s): 2fcf31e

Update README.md

Files changed (1) hide show

README.md +13 -11

README.md CHANGED Viewed

@@ -1,24 +1,26 @@
 ---
 license: apache-2.0
 tags:
-- GGUF
 - deepseek
 - qwen
 - qwen2
 - transformers
-base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 ---
 # DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
-## Introduction
-**DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant** is a ... (TODO)
----
-## How to Use on Your Device
-Below, we outline multiple ways to run the model locally.
 #### Option 1: Using Nexa SDK
@@ -56,7 +58,7 @@ Get the latest version from the [official website](https://lmstudio.ai/).
 **Step 2: Load and Run the Model**
-2. In LM Studio's top panel, search for and select `NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant`.
-3. Click `Download` (if not already downloaded) and wait for the model to load.
-4. Once loaded, go to the chat window and start a conversation.
 ---

 ---
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+library_name: transformers
 license: apache-2.0
 tags:
 - deepseek
 - qwen
 - qwen2
 - transformers
+- GGUF
 ---
 # DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
+## Background + Overview
+DeepSeek-R1 has been making headlines for rivaling OpenAI’s O1 reasoning model while remaining fully open-source. Many users want to run it locally to ensure data privacy, reduce latency, and maintain offline access. However, fitting such a large model onto personal devices typically requires quantization (e.g. Q4_K_M), which often sacrifices accuracy (up to ~22% accuracy loss) and undermines the benefits of the local reasoning model.
+We’ve solved the trade-off by quantizing the DeepSeek R1 Distilled model to one-fourth its original size—without losing any accuracy. This lets you run powerful on-device reasoning wherever you are, with no compromises. Tests on an **HP Omnibook AIPC** with an **AMD Ryzen™ AI 9 HX 370 processor** showed a decoding speed of **66.40 tokens per second** and a peak RAM usage of just **1228 MB** in NexaQuant version—compared to only **25.28 tokens** per second and **3788 MB RAM** in the unquantized version—while **maintaining full precision model accuracy.**
+## How to run locally
+NexaQuant is compatible with **Nexa-SDK**, **Ollama**, **LM Studio**, **Llama.cpp**, and any llama.cpp based project. Below, we outline multiple ways to run the model locally.
 #### Option 1: Using Nexa SDK
 **Step 2: Load and Run the Model**
+1. In LM Studio's top panel, search for and select `NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant`.
+2. Click `Download` (if not already downloaded) and wait for the model to load.
+3. Once loaded, go to the chat window and start a conversation.
 ---