logo

Quantization Library: DeepCompressor   Inference Engine: Nunchaku

teaser svdq-int4-flux.1-depth-dev is an INT4-quantized version of FLUX.1-Depth-dev, which can generate an image based on a text description while following the structure of a given input image. It offers approximately 4× memory savings while also running 2–3× faster than the original BF16 model.

Method

Quantization Method -- SVDQuant

intuition Overview of SVDQuant. Stage1: Originally, both the activation X and weights W contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.

Nunchaku Engine Design

engine (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in Down Projection and extra write of 16-bit outputs in Up Projection. Nunchaku optimizes this overhead with kernel fusion. (b) Down Projection and Quantize kernels use the same input, while Up Projection and 4-Bit Compute kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.

Model Description

  • Developed by: MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
  • Model type: INT W4A4 model
  • Model size: 6.64GB
  • Model resolution: The number of pixels need to be a multiple of 65,536.
  • License: Apache-2.0

Usage

Diffusers

Please follow the instructions in mit-han-lab/nunchaku to set up the environment. Also, install some ControlNet dependencies:

pip install git+https://github.com/asomoza/image_gen_aux.git
pip install controlnet_aux mediapipe

Then you can run the model with

import torch
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel

transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-depth-dev")

pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
).images[0]
image.save("flux.1-depth-dev.png")

Comfy UI

Work in progress. Stay tuned!

Limitations

  • The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this issue for more details.
  • You may observe some slight differences from the BF16 models in detail.

Citation

If you find this model useful or relevant to your research, please cite

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}
Downloads last month
27
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for mit-han-lab/svdq-int4-flux.1-depth-dev

Quantized
(1)
this model

Dataset used to train mit-han-lab/svdq-int4-flux.1-depth-dev

Collection including mit-han-lab/svdq-int4-flux.1-depth-dev