SandLogicTechnologies commited on
Commit
ea280b8
·
verified ·
1 Parent(s): a034120

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - deepseek-ai/DeepSeek-R1-Distill-Llama-8B
6
+ tags:
7
+ - Llama
8
+ - EdgeAI
9
+ ---
10
+ # DeepSeek-R1-Distill-Llama-8B Quantized Models
11
+
12
+ This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance.
13
+
14
+ Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co/SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/).
15
+ ## Model Description
16
+
17
+ These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures.
18
+
19
+ ### Available Quantized Versions
20
+
21
+ 1. **Q4_KM Version**
22
+ - 4-bit quantization using the K-means method
23
+ - Approximately 4GB model size
24
+ - Optimal balance between model size and performance
25
+ - Recommended for resource-constrained environments
26
+
27
+ 2. **Q5_KM Version**
28
+ - 5-bit quantization using the K-means method
29
+ - Approximately 5GB model size
30
+ - Higher precision than Q4 while maintaining significant size reduction
31
+ - Recommended when higher accuracy is needed
32
+
33
+
34
+ ## Usage
35
+
36
+
37
+ ```bash
38
+ pip install llama-cpp-python
39
+ ```
40
+ Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.
41
+
42
+ ### Basic Text Completion
43
+ Here's an example demonstrating how to use the high-level API for basic text completion:
44
+
45
+ ```bash
46
+ from llama_cpp import Llama
47
+
48
+ llm = Llama(
49
+ model_path="model/path/",
50
+ verbose=False,
51
+ # n_gpu_layers=-1, # Uncomment to use GPU acceleration
52
+ # n_ctx=2048, # Uncomment to increase the context window
53
+ )
54
+
55
+ output = llm(
56
+ "Q: Name the planets in the solar system? A: ", # Prompt
57
+ max_tokens=32, # Generate up to 32 tokens
58
+ stop=["Q:", "\n"], # Stop generating just before a new question
59
+ echo=False # Don't echo the prompt in the output
60
+ )
61
+
62
+ print(output["choices"][0]["text"])
63
+ ```
64
+
65
+ ## License
66
+
67
+ This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions.
68
+
69
+ ## Acknowledgments
70
+
71
+ We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.
72
+