File size: 9,181 Bytes

---
datasets:
- NeelNanda/pile-10k
base_model:
- deepseek-ai/DeepSeek-V3
---
## Model Details

This gguf model is an int4 model with group_size 32 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round). 

## How To Use
### Requirements
Please follow the [Build llama.cpp locally](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md) to install the necessary dependencies.
### INT4 Inference
```bash
>>> text="9.11和9.8哪个数字大"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## 要比较 **9.11** 和 **9.8** 的大小，可以将它们转化为小数形式以便比较。
## 1. **9.11** 已经是小数形式。
## 2. **9.8** 可以表示为 **9.80**。
## 现在比较小数点后的数字：
## - **9.11** 的小数部分是 **0.11**
## - **9.80** 的小数部分是 **0.80**
## 因为 **0.80** > **0.11**，所以 **9.8** 大于 **9.11**。
## 最终答案是：
## \boxed{9.8} [end of text]


>>> text="strawberry中有几个r?"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## The word "strawberry" contains two 'r' characters. Here's the breakdown:
## - **S**
## - **T**
## - **R**
## - **A**
## - **W**
## - **B**
## - **E**
## - **R**
## - **R**
## - **Y**
## So, there are **2** 'r' in "strawberry". [end of text]


>>> text="There is a girl who likes adventure,"
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## That’s great! Adventures can be thrilling and enriching experiences. Here are a few ideas to inspire her adventurous spirit:
## ### Outdoor Adventures:
## 1. **Hiking**: Explore national parks or local trails to connect with nature.
## 2. **Camping**: Spend a night under the stars or in a forest.
## 3. **Rock Climbing**: Challenge yourself with cliffs or indoor climbing walls.
## 4. **Kayaking or Canoeing**: Explore rivers, lakes, or even the ocean.
## ### Travel Adventures:
## 5. **Backpacking**: Travel to new countries or regions with minimal luggage.
## 6. **Road Trips**: Explore nearby towns or cities by driving or biking.
## 7. **Volunteering Abroad**: Combine adventure with helping others in foreign countries.
##
## ### Thrilling Activities:
## 8. **Skydiving**: Experience the thrill of free-falling.
## 9. **Scuba Diving**: Discover underwater worlds and marine life.
## 10. **Zip-lining**: Feel the rush of flying through the air.
## 
## ### Creative Adventures:
## 11. **Urban Exploration**: Discover hidden gems in your city or town.
## 12. **Photography Expeditions**: Capture unique landscapes or cultures.
## 13. **Learning Something New**: Try a hobby like surfing, pottery, or archery.
## 
## ### Nature Adventures:
## 14. **Wildlife Safaris**: Observe animals in their natural habitats.
## 15. **Forest Bathing**: Immerse yourself in nature for relaxation and mindfulness.
## 16. **Gardening**: Explore growing your own plants or creating a garden.
##
## ### Cultural Adventures:
## 17. **Festivals**: Attend cultural events to learn about traditions.
## 18. **Historical Sites**: Visit museums, ruins, or ancient landmarks.
## 19. **Language Learning**: Learn a new language and immerse yourself in its culture.
##
## No matter the adventure, it’s important to stay safe, prepared, and open-minded. Adventure is about exploring, learning, and embracing the unknown! 🌟 [end of text]


>>> text="Please give a brief introduction of DeepSeek company."
>>> ./llama-cli -m DeepSeek-V3-bf16-256x20B-Q4_0.gguf-00001-of-00009.gguf -p "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>$text\n<｜Assistant｜>" -n 512 --threads 16 -no-cnv
## Generated:
## DeepSeek is a Chinese company specializing in artificial intelligence (AI) technologies and applications. Founded in 2023, DeepSeek focuses on developing advanced AI solutions for various industries, including finance, healthcare, education, and entertainment. The company emphasizes innovation in natural language processing (NLP), machine learning, and data analytics to create intelligent systems that enhance decision-making and efficiency. DeepSeek aims to bridge the gap between cutting-edge AI research and practical applications, contributing to technological advancements and digital transformation across sectors. [end of text]
```

### Generate the model

**5*80G gpu is needed(could optimize), 1.4T cpu memory is needed**

**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16

```python
import safetensors
from safetensors.torch import save_file
 
for i in range(1, 164):
    idx_str = "0" * (5-len(str(i))) + str(i)
    safetensors_path = f"model-{idx_str}-of-000163.safetensors"
    print(safetensors_path)
    tensors = dict()
    with safetensors.safe_open(safetensors_path, framework="pt") as f:
        for key in f.keys():
            tensors[key] = f.get_tensor(key)
    save_file(tensors, safetensors_path, metadata={'format': 'pt'})
```

**2 replace the  modeling_deepseek.py with the following file**, basically align device and remove torch.no_grad as we need some tuning in AutoRound. 

https://github.com/intel/auto-round/blob/deepseekv3/modeling_deepseek.py

pip3 install git+https://github.com/intel/auto-round.git

**3   tuning**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = DeepSeek-V3-hf
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")

block = model.model.layers
device_map = {}
for n, m in block.named_modules():
  if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
    if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2])<63  and "down_proj" not in n :
      device ="cuda:1"
    elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(n.split('.')[-2])<63:
      device = "cuda:1"
    elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and  int(n.split('.')[-2]) < 128 and "down_proj" not in n:
      device = "cuda:2"
    elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(n.split('.')[-2]) >= 63 and  int(n.split('.')[-2]) < 128:
      device = "cuda:2"
    elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
          n.split('.')[-2]) < 192 and "down_proj" not in n:
      device = "cuda:3"
    elif "experts" in n and ("shared_experts" not in n) and "down_proj"  in n and int(
          n.split('.')[-2]) >= 128 and int(n.split('.')[-2]) < 192:
      device = "cuda:3"
    elif "experts" in n and ("shared_experts" not in n) and "down_proj" not in n and int(
          n.split('.')[-2]) >= 192:
      device = "cuda:4"
    elif "experts" in n and ("shared_experts" not in n) and "down_proj" in n and int(
          n.split('.')[-2]) >= 192:
      device = "cuda:4"
    else:
      device = "cuda:0"
    n = n[2:]
    device_map.update({n: device})

from auto_round import AutoRound

autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map,
                       iters=200,batch_size=8, seqlen=512,  enable_torch_compile=False)
autoround.quantize()
autoround.save_quantized(format="gguf:q4_0", output_dir="tmp_autoround")
```

## Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

## Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)