Lmxyy commited on
Commit
0c824f1
·
verified ·
1 Parent(s): 3af964b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -77
README.md CHANGED
@@ -1,95 +1,32 @@
1
  ---
 
 
 
 
 
 
 
2
  license: other
 
3
  license_name: flux-1-dev-non-commercial-license
 
4
  tags:
5
  - text-to-image
6
  - SVDQuant
7
  - FLUX.1-dev
8
- - INT4
9
  - FLUX.1
10
  - Diffusion
11
  - Quantization
12
  - ICLR2025
13
- language:
14
- - en
15
- base_model:
16
- - black-forest-labs/FLUX.1-dev
17
- base_model_relation: quantized
18
- pipeline_tag: text-to-image
19
- datasets:
20
- - mit-han-lab/svdquant-datasets
21
- library_name: diffusers
22
  ---
 
23
 
24
  <p align="center" style="border-radius: 10px">
25
- <img src="https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/logo.svg" width="50%" alt="logo"/>
26
  </p>
27
- <h4 style="display: flex; justify-content: center; align-items: center; text-align: center;">Quantization Library:&nbsp;<a href='https://github.com/mit-han-lab/deepcompressor'>DeepCompressor</a> &ensp; Inference Engine:&nbsp;<a href='https://github.com/mit-han-lab/nunchaku'>Nunchaku</a>
28
- </h4>
29
-
30
-
31
- <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
32
- <a href="https://arxiv.org/abs/2411.05007">[Paper]</a>&ensp;
33
- <a href='https://github.com/mit-han-lab/nunchaku'>[Code]</a>&ensp;
34
- <a href='https://svdquant.mit.edu'>[Demo]</a>&ensp;
35
- <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
36
- <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
37
- </div>
38
-
39
- ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
40
- SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
41
-
42
- ## Method
43
- #### Quantization Method -- SVDQuant
44
-
45
- ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)
46
- Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weights ***W*** contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
47
 
48
- #### Nunchaku Engine Design
49
-
50
- ![engine](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
51
-
52
- ## Model Description
53
-
54
- - **Developed by:** MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
55
- - **Model type:** INT W4A4 model
56
- - **Model size:** 6.64GB
57
- - **Model resolution:** The number of pixels need to be a multiple of 65,536.
58
- - **License:** Apache-2.0
59
-
60
- ## Usage
61
-
62
- ### Diffusers
63
-
64
- Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Then you can run the model with
65
-
66
- ```python
67
- import torch
68
- from diffusers import FluxPipeline
69
-
70
- from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
71
-
72
- transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
73
- pipeline = FluxPipeline.from_pretrained(
74
- "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
75
- ).to("cuda")
76
- image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
77
- image.save("example.png")
78
- ```
79
-
80
- ### Comfy UI
81
-
82
- ![comfyui](https://github.com/mit-han-lab/nunchaku/blob/main/assets/comfyui.jpg?raw=true)
83
- Please check [comfyui/README.md](comfyui/README.md) for the usage.
84
-
85
- ## Limitations
86
-
87
- - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
88
- - You may observe some slight differences from the BF16 models in details.
89
-
90
- ### Citation
91
-
92
- If you find this model useful or relevant to your research, please cite
93
 
94
  ```bibtex
95
  @inproceedings{
@@ -99,4 +36,8 @@ If you find this model useful or relevant to your research, please cite
99
  booktitle={The Thirteenth International Conference on Learning Representations},
100
  year={2025}
101
  }
102
- ```
 
 
 
 
 
1
  ---
2
+ base_model: black-forest-labs/FLUX.1-dev
3
+ base_model_relation: quantized
4
+ datasets:
5
+ - mit-han-lab/svdquant-datasets
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
  license: other
10
+ license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
11
  license_name: flux-1-dev-non-commercial-license
12
+ pipeline_tag: text-to-image
13
  tags:
14
  - text-to-image
15
  - SVDQuant
16
  - FLUX.1-dev
 
17
  - FLUX.1
18
  - Diffusion
19
  - Quantization
20
  - ICLR2025
21
+
 
 
 
 
 
 
 
 
22
  ---
23
+ **This repository has been deprecated and will be hidden in December 2025. Please use https://huggingface.co/nunchaku-tech/nunchaku-flux.1-dev.**
24
 
25
  <p align="center" style="border-radius: 10px">
26
+ <img src="https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/nunchaku.svg" width="30%" alt="Nunchaku Logo"/>
27
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ```bibtex
32
  @inproceedings{
 
36
  booktitle={The Thirteenth International Conference on Learning Representations},
37
  year={2025}
38
  }
39
+ ```
40
+
41
+ ## Attribution Notice
42
+
43
+ The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.