Lmxyy commited on
Commit
19e21f3
·
verified ·
1 Parent(s): d043303

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: flux-1-dev-non-commercial-license
4
+ tags:
5
+ - image-to-image
6
+ - SVDQuant
7
+ - INT4
8
+ - FLUX.1
9
+ - Diffusion
10
+ - Quantization
11
+ - ControlNet
12
+ - depth-to-image
13
+ - image-generation
14
+ - text-to-image
15
+ - ICLR2025
16
+ - FLUX.1-Canny-dev
17
+ language:
18
+ - en
19
+ base_model:
20
+ - black-forest-labs/FLUX.1-Canny-dev
21
+ base_model_relation: quantized
22
+ pipeline_tag: image-to-image
23
+ datasets:
24
+ - mit-han-lab/svdquant-datasets
25
+ library_name: diffusers
26
+ ---
27
+
28
+ <p align="center" style="border-radius: 10px">
29
+ <img src="https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/logo.svg" width="50%" alt="logo"/>
30
+ </p>
31
+ <h4 style="display: flex; justify-content: center; align-items: center; text-align: center;">Quantization Library:&nbsp;<a href='https://github.com/mit-han-lab/deepcompressor'>DeepCompressor</a> &ensp; Inference Engine:&nbsp;<a href='https://github.com/mit-han-lab/nunchaku'>Nunchaku</a>
32
+ </h4>
33
+
34
+
35
+ <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
36
+ <a href="https://arxiv.org/abs/2411.05007">[Paper]</a>&ensp;
37
+ <a href='https://github.com/mit-han-lab/nunchaku'>[Code]</a>&ensp;
38
+ <a href='https://svdquant.mit.edu'>[Demo]</a>&ensp;
39
+ <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
40
+ <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
41
+ </div>
42
+
43
+ ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-depth-dev/resolve/main/demo.jpg)
44
+ `svdq-int4-flux.1-canny-dev` is an INT4-quantized version of [`FLUX.1-Canny-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev), which can generate an image based on a text description while following the Canny edge of a given input image. It offers approximately 4× memory savings while also running 2–3× faster than the original BF16 model.
45
+
46
+ ## Method
47
+ #### Quantization Method -- SVDQuant
48
+
49
+ ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)
50
+ Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weights ***W*** contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
51
+
52
+ #### Nunchaku Engine Design
53
+
54
+ ![engine](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
55
+
56
+ ## Model Description
57
+
58
+ - **Developed by:** MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
59
+ - **Model type:** INT W4A4 model
60
+ - **Model size:** 6.64GB
61
+ - **Model resolution:** The number of pixels need to be a multiple of 65,536.
62
+ - **License:** Apache-2.0
63
+
64
+ ## Usage
65
+
66
+ ### Diffusers
67
+
68
+ Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Also, install some ControlNet dependencies:
69
+
70
+ ```shell
71
+ pip install git+https://github.com/asomoza/image_gen_aux.git
72
+ pip install controlnet_aux mediapipe
73
+ ```
74
+
75
+ Then you can run the model with
76
+
77
+ ```python
78
+ import torch
79
+ from controlnet_aux import CannyDetector
80
+ from diffusers import FluxControlPipeline
81
+ from diffusers.utils import load_image
82
+
83
+ from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
84
+
85
+ transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-canny-dev")
86
+ pipe = FluxControlPipeline.from_pretrained(
87
+ "black-forest-labs/FLUX.1-Canny-dev", transformer=transformer, torch_dtype=torch.bfloat16
88
+ ).to("cuda")
89
+
90
+ prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
91
+ control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
92
+
93
+ processor = CannyDetector()
94
+ control_image = processor(
95
+ control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024
96
+ )
97
+
98
+ image = pipe(
99
+ prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0
100
+ ).images[0]
101
+ image.save("flux.1-canny-dev.png")
102
+ ```
103
+
104
+ ### Comfy UI
105
+
106
+ Work in progress. Stay tuned!
107
+
108
+ ## Limitations
109
+
110
+ - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
111
+ - You may observe some slight differences from the BF16 models in detail.
112
+
113
+ ### Citation
114
+
115
+ If you find this model useful or relevant to your research, please cite
116
+
117
+ ```bibtex
118
+ @inproceedings{
119
+ li2024svdquant,
120
+ title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
121
+ author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
122
+ booktitle={The Thirteenth International Conference on Learning Representations},
123
+ year={2025}
124
+ }
125
+ ```