p1atdev
/

CogView4-6B-quanto_int8

Model card Files Files and versions

p1atdev commited on Mar 8

Commit

7bcbaa9

·

verified ·

1 Parent(s): ddd497a

Update README.md

Files changed (1) hide show

README.md +23 -3

README.md CHANGED Viewed

@@ -1,3 +1,23 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- THUDM/CogView4-6B
+base_model_relation: quantized
+tags:
+- quanto
+---
+## Quantization settings
+- `vae.`: `torch.bfloat16`. No quantization.
+- `text_encoder.layers.`:
+  - Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
+  - Target layers:`["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]`
+- `diffusion_model.`:
+  - Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
+  - Target layers: `["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]`
+## VRAM cosumption
+- Text encoder (`text_encoder.`): about 11 GB
+- Denoiser (`diffusion_model.`): about 10 GB
+- VAE (`vae.`): about 1.5 GB