File size: 3,973 Bytes

---
language:
- en
tags:
- 'art '
- stable-diffusion-xl-diffusers
- stable-diffusion-xl
- controlnet
- lineart
---
# ControlNet Standard Lineart for SDXL
SDXL has perfect content generation functions and amazing LoRa performance, but its ControlNet is always its drawback, filtering out most of the users. Based on the computational power constraints of personal GPU, one cannot easily train and tune a perfect ControlNet model. 


**This model attempts to fill the insufficiency of the ControlNet for SDXL to lower the requirements for SDXL to personal users.** 

## Environment Setup and Usage

The training [script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet_sdxl.py) used is from official Diffuser library.

The environment setup guide can be found by the [official Diffuser guide](https://github.com/huggingface/diffusers/tree/main). 

Usage example:
```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL
from diffusers.utils import load_image
import numpy as  np
import  torch
from  PIL  import  Image

controlnet_conditioning_scale = 0.9  
controlnet = ControlNetModel.from_pretrained(
"path/to/this/directory", torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "Your prompt"
negative_prompt = "Your negative prompt" 
line = Image.open("path/to/your/controling/image")

image = pipe(
	prompt,
	controlnet_conditioning_scale=controlnet_conditioning_scale,
	image=line
).images[0]
```

## Training Setup:

 - **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
 - **Dataset**: [cc12m](https://github.com/rom1504/img2dataset) with 1024 resolution and up and over 300k images pairs. Cropped or used [image restoration](https://github.com/xinntao/Real-ESRGAN) resizing to 1024x1024 square images to feed into script. 
 -  **Lineart**: Used ***LineartStandardDetector*** from ***controlnet_aux*** to extract controling images. 
 - **Total Batch Size**: 16 (4 gradient accumlation step * 4 GPU in parallel)
 - **Steps**: 50k

## Result:

Compared to simple line interpretation, this model can understand depth relation as shown below:

![Example Image](https://github.com/ShermanGu/ControlNet-Standard-Lineart-for-Diffuser-XL/blob/main/Published%20Picture/1.png?raw=true)
![Example Image](https://github.com/ShermanGu/ControlNet-Standard-Lineart-for-Diffuser-XL/blob/main/Published%20Picture/2.png?raw=true)
![Example Image](https://github.com/ShermanGu/ControlNet-Standard-Lineart-for-Diffuser-XL/blob/main/Published%20Picture/3.png?raw=true)
![Example Image](https://github.com/ShermanGu/ControlNet-Standard-Lineart-for-Diffuser-XL/blob/main/Published%20Picture/4.png?raw=true)
![Example Image](https://github.com/ShermanGu/ControlNet-Standard-Lineart-for-Diffuser-XL/blob/main/Published%20Picture/image.png?raw=true)

## Note:

 1. Loading custom datasets through HuggingFace needs to modify the script to realize full automation. In the ***train_controlnet_sdxl.py***, we need to modify line 650 to:
	```python
	 if  args.train_data_dir is  not  None:
		dataset = load_dataset(
		args.train_data_dir,
		cache_dir=args.cache_dir,
		trust_remote_code=True,
		)
	```
	As for the dataset, we need to organize the structure as demonstrated in the [dataset_example](https://civitai.com/articles/2078/play-in-control-controlnet-training-setup-guide), and change the script to:
		``` 
	--train_data_dir="/path/to/your/dataset_example"
		```

2. Based on the experiment, sometimes this ControlNet cannot understand colorization very well on the xl-base-1.0. However, it can capture the line perfectly. So I suspect the miss colorization happened on the base model I chose. More experiments are needed.