Update README.md
Browse files
README.md
CHANGED
@@ -318,7 +318,7 @@ we have no enough resource to evaluate the model
|
|
318 |
|
319 |
We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
|
320 |
|
321 |
-
|
322 |
model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
|
323 |
model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
|
324 |
model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
|
@@ -328,11 +328,11 @@ model.layers.59.mlp.experts.138.down_proj tensor(1568.) tensor(190.8769)
|
|
328 |
model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
|
329 |
model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
|
330 |
|
331 |
-
|
332 |
|
333 |
**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
|
334 |
|
335 |
-
|
336 |
import safetensors
|
337 |
from safetensors.torch import save_file
|
338 |
|
@@ -345,7 +345,7 @@ for i in range(1, 164):
|
|
345 |
for key in f.keys():
|
346 |
tensors[key] = f.get_tensor(key)
|
347 |
save_file(tensors, safetensors_path, metadata={'format': 'pt'})
|
348 |
-
|
349 |
|
350 |
|
351 |
|
|
|
318 |
|
319 |
We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
|
320 |
|
321 |
+
```python
|
322 |
model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
|
323 |
model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
|
324 |
model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
|
|
|
328 |
model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
|
329 |
model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
|
330 |
|
331 |
+
```
|
332 |
|
333 |
**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
|
334 |
|
335 |
+
```python
|
336 |
import safetensors
|
337 |
from safetensors.torch import save_file
|
338 |
|
|
|
345 |
for key in f.keys():
|
346 |
tensors[key] = f.get_tensor(key)
|
347 |
save_file(tensors, safetensors_path, metadata={'format': 'pt'})
|
348 |
+
```
|
349 |
|
350 |
|
351 |
|