File size: 9,181 Bytes
c37e4f8 9106678 e42f8eb 9106678 e42f8eb 9106678 e42f8eb 9106678 e42f8eb 9106678 dfedb9d 454e3b7 eb85df1 454e3b7 eb85df1 7b32e98 5395fb9 eb85df1 5395fb9 9106678 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
---
datasets:
- NeelNanda/pile-10k
base_model:
- deepseek-ai/DeepSeek-V3
---
## Model Details
This gguf model is an int4 model with group_size 32 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round).
## How To Use
### Requirements
Please follow the [Build llama.cpp locally](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md) to install the necessary dependencies.
### INT4 Inference
```bash
>>> text="9.11和9.8哪个数字大"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>$text\n<|Assistant|>" -n 512 --threads 16 -no-cnv
## Generated:
## 要比较 **9.11** 和 **9.8** 的大小,可以将它们转化为小数形式以便比较。
## 1. **9.11** 已经是小数形式。
## 2. **9.8** 可以表示为 **9.80**。
## 现在比较小数点后的数字:
## - **9.11** 的小数部分是 **0.11**
## - **9.80** 的小数部分是 **0.80**
## 因为 **0.80** > **0.11**,所以 **9.8** 大于 **9.11**。
## 最终答案是:
## \boxed{9.8} [end of text]
>>> text="strawberry中有几个r?"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>$text\n<|Assistant|>" -n 512 --threads 16 -no-cnv
## Generated:
## The word "strawberry" contains two 'r' characters. Here's the breakdown:
## - **S**
## - **T**
## - **R**
## - **A**
## - **W**
## - **B**
## - **E**
## - **R**
## - **R**
## - **Y**
## So, there are **2** 'r' in "strawberry". [end of text]
>>> text="There is a girl who likes adventure,"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>$text\n<|Assistant|>" -n 512 --threads 16 -no-cnv
## Generated:
## That’s great! Adventures can be thrilling and enriching experiences. Here are a few ideas to inspire her adventurous spirit:
## ### Outdoor Adventures:
## 1. **Hiking**: Explore national parks or local trails to connect with nature.
## 2. **Camping**: Spend a night under the stars or in a forest.
## 3. **Rock Climbing**: Challenge yourself with cliffs or indoor climbing walls.
## 4. **Kayaking or Canoeing**: Explore rivers, lakes, or even the ocean.
## ### Travel Adventures:
## 5. **Backpacking**: Travel to new countries or regions with minimal luggage.
## 6. **Road Trips**: Explore nearby towns or cities by driving or biking.
## 7. **Volunteering Abroad**: Combine adventure with helping others in foreign countries.
##
## ### Thrilling Activities:
## 8. **Skydiving**: Experience the thrill of free-falling.
## 9. **Scuba Diving**: Discover underwater worlds and marine life.
## 10. **Zip-lining**: Feel the rush of flying through the air.
##
## ### Creative Adventures:
## 11. **Urban Exploration**: Discover hidden gems in your city or town.
## 12. **Photography Expeditions**: Capture unique landscapes or cultures.
## 13. **Learning Something New**: Try a hobby like surfing, pottery, or archery.
##
## ### Nature Adventures:
## 14. **Wildlife Safaris**: Observe animals in their natural habitats.
## 15. **Forest Bathing**: Immerse yourself in nature for relaxation and mindfulness.
## 16. **Gardening**: Explore growing your own plants or creating a garden.
##
## ### Cultural Adventures:
## 17. **Festivals**: Attend cultural events to learn about traditions.
## 18. **Historical Sites**: Visit museums, ruins, or ancient landmarks.
## 19. **Language Learning**: Learn a new language and immerse yourself in its culture.
##
## No matter the adventure, it’s important to stay safe, prepared, and open-minded. Adventure is about exploring, learning, and embracing the unknown! 🌟 [end of text]
>>> text="Please give a brief introduction of DeepSeek company."
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>$text\n<|Assistant|>" -n 512 --threads 16 -no-cnv
## Generated:
## DeepSeek is a Chinese company specializing in artificial intelligence (AI) technologies and applications. Founded in 2023, DeepSeek focuses on developing advanced AI solutions for various industries, including finance, healthcare, education, and entertainment. The company emphasizes innovation in natural language processing (NLP), machine learning, and data analytics to create intelligent systems that enhance decision-making and efficiency. DeepSeek aims to bridge the gap between cutting-edge AI research and practical applications, contributing to technological advancements and digital transformation across sectors. [end of text]
```
### Generate the model
**5*80G gpu is needed(could optimize), 1.4T cpu memory is needed**
**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
```python
import safetensors
from safetensors.torch import save_file
for i in range(1, 164):
idx_str = "0" * (5-len(str(i))) + str(i)
safetensors_path = f"model-{idx_str}-of-000163.safetensors"
print(safetensors_path)
tensors = dict()
with safetensors.safe_open(safetensors_path, framework="pt") as f:
for key in f.keys():
tensors[key] = f.get_tensor(key)
save_file(tensors, safetensors_path, metadata={'format': 'pt'})
```
**2 replace the modeling_deepseek.py with the following file**, basically align device and remove torch.no_grad as we need some tuning in AutoRound.
https://github.com/intel/auto-round/blob/deepseekv3/modeling_deepseek.py
pip3 install git+https://github.com/intel/auto-round.git
**3 tuning**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = DeepSeek-V3-hf
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")
block = model.model.layers
device_map = {}
for n, m in block.named_modules():
if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2])<63 and "down_proj" not in n :
device ="cuda:1"
elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(n.split('.')[-2])<63:
device = "cuda:1"
elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and int(n.split('.')[-2]) < 128 and "down_proj" not in n:
device = "cuda:2"
elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(n.split('.')[-2]) >= 63 and int(n.split('.')[-2]) < 128:
device = "cuda:2"
elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
n.split('.')[-2]) < 192 and "down_proj" not in n:
device = "cuda:3"
elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(
n.split('.')[-2]) >= 128 and int(n.split('.')[-2]) < 192:
device = "cuda:3"
elif "experts" in n and ("shared_experts" not in n) and "down_proj" not in n and int(
n.split('.')[-2]) >= 192:
device = "cuda:4"
elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(
n.split('.')[-2]) >= 192:
device = "cuda:4"
else:
device = "cuda:0"
n = n[2:]
device_map.update({n: device})
from auto_round import AutoRound
autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map,
iters=200,batch_size=8, seqlen=512, enable_torch_compile=False)
autoround.quantize()
autoround.save_quantized(format="gguf:q4_0", output_dir="tmp_autoround")
```
## Ethical Considerations and Limitations
The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
Therefore, before deploying any applications of the model, developers should perform safety testing.
## Caveats and Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Here are a couple of useful links to learn more about Intel's AI software:
- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
## Disclaimer
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
## Cite
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round) |