Lmxyy commited on
Commit
59b71bb
·
verified ·
1 Parent(s): ed9bb57

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -104
README.md CHANGED
@@ -1,118 +1,28 @@
1
  ---
 
 
 
 
 
 
 
2
  license: other
 
3
  license_name: flux-1-dev-non-commercial-license
 
4
  tags:
5
  - image-to-image
6
  - SVDQuant
7
- - INT4
8
  - FLUX.1
9
  - Diffusion
10
  - Quantization
11
- - ControlNet
12
- - depth-to-image
13
- - image-generation
14
- - text-to-image
15
  - ICLR2025
16
- - FLUX.1-Canny-dev
17
- language:
18
- - en
19
- base_model:
20
- - black-forest-labs/FLUX.1-Canny-dev
21
- base_model_relation: quantized
22
- pipeline_tag: image-to-image
23
- datasets:
24
- - mit-han-lab/svdquant-datasets
25
- library_name: diffusers
26
- ---
27
-
28
- <p align="center" style="border-radius: 10px">
29
- <img src="https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/logo.svg" width="50%" alt="logo"/>
30
- </p>
31
- <h4 style="display: flex; justify-content: center; align-items: center; text-align: center;">Quantization Library:&nbsp;<a href='https://github.com/mit-han-lab/deepcompressor'>DeepCompressor</a> &ensp; Inference Engine:&nbsp;<a href='https://github.com/mit-han-lab/nunchaku'>Nunchaku</a>
32
- </h4>
33
-
34
-
35
- <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
36
- <a href="https://arxiv.org/abs/2411.05007">[Paper]</a>&ensp;
37
- <a href='https://github.com/mit-han-lab/nunchaku'>[Code]</a>&ensp;
38
- <a href='https://svdquant.mit.edu'>[Demo]</a>&ensp;
39
- <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
40
- <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
41
- </div>
42
-
43
- ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-canny-dev/resolve/main/demo.jpg)
44
- `svdq-int4-flux.1-canny-dev` is an INT4-quantized version of [`FLUX.1-Canny-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev), which can generate an image based on a text description while following the Canny edge of a given input image. It offers approximately 4× memory savings while also running 2–3× faster than the original BF16 model.
45
-
46
- ## Method
47
- #### Quantization Method -- SVDQuant
48
-
49
- ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)
50
- Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weights ***W*** contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
51
-
52
- #### Nunchaku Engine Design
53
 
54
- ![engine](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
55
-
56
- ## Model Description
57
-
58
- - **Developed by:** MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
59
- - **Model type:** INT W4A4 model
60
- - **Model size:** 6.64GB
61
- - **Model resolution:** The number of pixels need to be a multiple of 65,536.
62
- - **License:** Apache-2.0
63
-
64
- ## Usage
65
-
66
- ### Diffusers
67
-
68
- Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Also, install some ControlNet dependencies:
69
-
70
- ```shell
71
- pip install git+https://github.com/asomoza/image_gen_aux.git
72
- pip install controlnet_aux mediapipe
73
- ```
74
-
75
- Then you can run the model with
76
-
77
- ```python
78
- import torch
79
- from controlnet_aux import CannyDetector
80
- from diffusers import FluxControlPipeline
81
- from diffusers.utils import load_image
82
-
83
- from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
84
-
85
- transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-canny-dev")
86
- pipe = FluxControlPipeline.from_pretrained(
87
- "black-forest-labs/FLUX.1-Canny-dev", transformer=transformer, torch_dtype=torch.bfloat16
88
- ).to("cuda")
89
-
90
- prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
91
- control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
92
-
93
- processor = CannyDetector()
94
- control_image = processor(
95
- control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024
96
- )
97
-
98
- image = pipe(
99
- prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0
100
- ).images[0]
101
- image.save("flux.1-canny-dev.png")
102
- ```
103
-
104
- ### Comfy UI
105
-
106
- Work in progress. Stay tuned!
107
-
108
- ## Limitations
109
-
110
- - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
111
- - You may observe some slight differences from the BF16 models in detail.
112
-
113
- ### Citation
114
 
115
- If you find this model useful or relevant to your research, please cite
116
 
117
  ```bibtex
118
  @inproceedings{
@@ -122,4 +32,8 @@ If you find this model useful or relevant to your research, please cite
122
  booktitle={The Thirteenth International Conference on Learning Representations},
123
  year={2025}
124
  }
125
- ```
 
 
 
 
 
1
  ---
2
+ base_model: black-forest-labs/FLUX.1-Canny-dev
3
+ base_model_relation: quantized
4
+ datasets:
5
+ - mit-han-lab/svdquant-datasets
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
  license: other
10
+ license_link: https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev/blob/main/LICENSE.md
11
  license_name: flux-1-dev-non-commercial-license
12
+ pipeline_tag: image-to-image
13
  tags:
14
  - image-to-image
15
  - SVDQuant
16
+ - FLUX.1-Canny-dev
17
  - FLUX.1
18
  - Diffusion
19
  - Quantization
 
 
 
 
20
  - ICLR2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ---
23
+ **This repository has been deprecated and will be hidden in December 2025. Please use https://huggingface.co/nunchaku-tech/nunchaku-flux.1-canny-dev.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
+ ## Citation
26
 
27
  ```bibtex
28
  @inproceedings{
 
32
  booktitle={The Thirteenth International Conference on Learning Representations},
33
  year={2025}
34
  }
35
+ ```
36
+
37
+ ## Attribution Notice
38
+
39
+ The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.