File size: 2,993 Bytes
f1f9265
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
<!-- Copyright 2024 NVIDIA CORPORATION & AFFILIATES

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

SPDX-License-Identifier: Apache-2.0 -->

## 🔥 ControlNet

We incorporate a ControlNet-like(https://github.com/lllyasviel/ControlNet) module enables fine-grained control over text-to-image diffusion models. We implement a ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability alongside high-quality image generation.

<p align="center">
  <img src="https://raw.githubusercontent.com/NVlabs/Sana/refs/heads/page/asset/content/controlnet/sana_controlnet.jpg"  height=480>
</p>

## Inference of `Sana + ControlNet`

### 1). Gradio Interface

```bash
python app/app_sana_controlnet_hed.py \
        --config configs/sana_controlnet_config/Sana_1600M_1024px_controlnet_bf16.yaml \
        --model_path hf://Efficient-Large-Model/Sana_1600M_1024px_BF16_ControlNet_HED/checkpoints/Sana_1600M_1024px_BF16_ControlNet_HED.pth
```

<p align="center" border-raduis="10px">
  <img src="https://nvlabs.github.io/Sana/asset/content/controlnet/controlnet_app.jpg" width="90%" alt="teaser_page2"/>
</p>

### 2). Inference with JSON file

```bash
python tools/controlnet/inference_controlnet.py \
        --config configs/sana_controlnet_config/Sana_1600M_1024px_controlnet_bf16.yaml \
        --model_path hf://Efficient-Large-Model/Sana_1600M_1024px_BF16_ControlNet_HED/checkpoints/Sana_1600M_1024px_BF16_ControlNet_HED.pth \
        --json_file asset/controlnet/samples_controlnet.json
```

### 3). Inference code snap

```python
import torch
from PIL import Image
from app.sana_controlnet_pipeline import SanaControlNetPipeline

device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = SanaControlNetPipeline("configs/sana_controlnet_config/Sana_1600M_1024px_controlnet_bf16.yaml")
pipe.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px_BF16_ControlNet_HED/checkpoints/Sana_1600M_1024px_BF16_ControlNet_HED.pth")

ref_image = Image.open("asset/controlnet/ref_images/A transparent sculpture of a duck made out of glass. The sculpture is in front of a painting of a la.jpg")
prompt = "A transparent sculpture of a duck made out of glass. The sculpture is in front of a painting of a landscape."

images = pipe(
    prompt=prompt,
    ref_image=ref_image,
    guidance_scale=4.5,
    num_inference_steps=10,
    sketch_thickness=2,
    generator=torch.Generator(device=device).manual_seed(0),
)
```

## Training of `Sana + ControlNet`

### Coming soon