p1atdev's picture
Update README.md
b08d660 verified
---
license: apache-2.0
base_model:
- THUDM/CogView4-6B
base_model_relation: quantized
tags:
- quanto
---
## Quantization settings
- `vae.`: `torch.bfloat16`. No quantization.
- `text_encoder.layers.`:
- Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
- Target layers:`["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]`
- `diffusion_model.`:
- Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
- Target layers: `["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]`
## VRAM cosumption
- Text encoder (`text_encoder.`): about 11 GB
- Denoiser (`diffusion_model.`): about 10 GB
## Samples
|`torch.bfloat16` | Quanto Int8 |
| - | - |
| <img src="./images/sample_bf16_01.jpg" width="320px" /> | <img src="./images/sample_quanto_01.jpg" width="320px" /> |
| VRAM 40GB (without offloading) | VRAM 28GB (without offloading) |
<details><summary>Generation parameters</summary>
- prompt: `""" A photo of a nendoroid figure of hatsune miku holding a sign that says "CogView4" """"`
- negative_prompt: `"blurry, low quality, horror"`
- height: `1152`
- width: `1152`
- cfg_scale: `3.5`
- num_inference_steps: `20`
</details>