n1ck-guo commited on
Commit
1ef21dc
·
verified ·
1 Parent(s): b556675

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -318,7 +318,7 @@ we have no enough resource to evaluate the model
318
 
319
  We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
320
 
321
- ~~~python
322
  model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
323
  model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
324
  model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
@@ -328,11 +328,11 @@ model.layers.59.mlp.experts.138.down_proj tensor(1568.) tensor(190.8769)
328
  model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
329
  model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
330
 
331
- ~~~
332
 
333
  **1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
334
 
335
- ~~~python
336
  import safetensors
337
  from safetensors.torch import save_file
338
 
@@ -345,7 +345,7 @@ for i in range(1, 164):
345
  for key in f.keys():
346
  tensors[key] = f.get_tensor(key)
347
  save_file(tensors, safetensors_path, metadata={'format': 'pt'})
348
- ~~~
349
 
350
 
351
 
 
318
 
319
  We discovered that the inputs and outputs of certain layers in this model are very large and even exceed the FP16 range when tested with a few prompts. It is recommended to exclude these layers from quantization—particularly the 'down_proj' in layer 60—and run them using BF16 precision instead. However, we have not implemented this in this int4 model as in cpu, the compute dtype for int4 is bf16 or FP32.
320
 
321
+ ```python
322
  model.layers.60.mlp.experts.150.down_proj tensor(1144.) tensor(2122.9451)
323
  model.layers.60.mlp.experts.231.down_proj tensor(25856.) tensor(12827.9980)
324
  model.layers.60.mlp.shared_experts.down_proj tensor(1880.) tensor(3156.7344)
 
328
  model.layers.60.mlp.experts.81.down_proj tensor(7360.) tensor(10024.4531)
329
  model.layers.60.mlp.experts.92.down_proj tensor(116224.) tensor(55192.4180)
330
 
331
+ ```
332
 
333
  **1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
334
 
335
+ ```python
336
  import safetensors
337
  from safetensors.torch import save_file
338
 
 
345
  for key in f.keys():
346
  tensors[key] = f.get_tensor(key)
347
  save_file(tensors, safetensors_path, metadata={'format': 'pt'})
348
+ ```
349
 
350
 
351