OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

4-bit precision

Model card Files Files and versions Community

n1ck-guo commited on 17 days ago

Commit

1ef21dc

·

verified ·

1 Parent(s): b556675

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -318,7 +318,7 @@ we have no enough resource to evaluate the model
 We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
-~~~python
 model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
 model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
 model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
@@ -328,11 +328,11 @@ model.layers.59.mlp.experts.138.down_proj tensor(1568.) tensor(190.8769)
 model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
 model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
-~~~
 **1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
-~~~python
 import safetensors
 from safetensors.torch import save_file
@@ -345,7 +345,7 @@ for i in range(1, 164):
         for key in f.keys():
             tensors[key] = f.get_tensor(key)
     save_file(tensors, safetensors_path, metadata={'format': 'pt'})
-~~~

 We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
+```python
 model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
 model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
 model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
 model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
 model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
+```
 **1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
+```python
 import safetensors
 from safetensors.torch import save_file
         for key in f.keys():
             tensors[key] = f.get_tensor(key)
     save_file(tensors, safetensors_path, metadata={'format': 'pt'})
+```