|
<!--Copyright 2024 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# Improve generation quality with FreeU |
|
|
|
[[open-in-colab]] |
|
|
|
The UNet is responsible for denoising during the reverse diffusion process, and there are two distinct features in its architecture: |
|
|
|
1. Backbone features primarily contribute to the denoising process |
|
2. Skip features mainly introduce high-frequency features into the decoder module and can make the network overlook the semantics in the backbone features |
|
|
|
However, the skip connection can sometimes introduce unnatural image details. [FreeU](https://hf.co/papers/2309.11497) is a technique for improving image quality by rebalancing the contributions from the UNet’s skip connections and backbone feature maps. |
|
|
|
FreeU is applied during inference and it does not require any additional training. The technique works for different tasks such as text-to-image, image-to-image, and text-to-video. |
|
|
|
In this guide, you will apply FreeU to the [`StableDiffusionPipeline`], [`StableDiffusionXLPipeline`], and [`TextToVideoSDPipeline`]. You need to install Diffusers from source to run the examples below. |
|
|
|
## StableDiffusionPipeline |
|
|
|
Load the pipeline: |
|
|
|
```py |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None |
|
).to("cuda") |
|
``` |
|
|
|
Then enable the FreeU mechanism with the FreeU-specific hyperparameters. These values are scaling factors for the backbone and skip features. |
|
|
|
```py |
|
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.2, b2=1.4) |
|
``` |
|
|
|
The values above are from the official FreeU [code repository](https://github.com/ChenyangSi/FreeU) where you can also find [reference hyperparameters](https://github.com/ChenyangSi/FreeU#range-for-more-parameters) for different models. |
|
|
|
<Tip> |
|
|
|
Disable the FreeU mechanism by calling `disable_freeu()` on a pipeline. |
|
|
|
</Tip> |
|
|
|
And then run inference: |
|
|
|
```py |
|
prompt = "A squirrel eating a burger" |
|
seed = 2023 |
|
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0] |
|
image |
|
``` |
|
|
|
The figure below compares non-FreeU and FreeU results respectively for the same hyperparameters used above (`prompt` and `seed`): |
|
|
|
 |
|
|
|
|
|
Let's see how Stable Diffusion 2 results are impacted: |
|
|
|
```py |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, safety_checker=None |
|
).to("cuda") |
|
|
|
prompt = "A squirrel eating a burger" |
|
seed = 2023 |
|
|
|
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.1, b2=1.2) |
|
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0] |
|
image |
|
``` |
|
|
|
 |
|
|
|
## Stable Diffusion XL |
|
|
|
Finally, let's take a look at how FreeU affects Stable Diffusion XL results: |
|
|
|
```py |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, |
|
).to("cuda") |
|
|
|
prompt = "A squirrel eating a burger" |
|
seed = 2023 |
|
|
|
# Comes from |
|
# https://wandb.ai/nasirk24/UNET-FreeU-SDXL/reports/FreeU-SDXL-Optimal-Parameters--Vmlldzo1NDg4NTUw |
|
pipeline.enable_freeu(s1=0.6, s2=0.4, b1=1.1, b2=1.2) |
|
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0] |
|
image |
|
``` |
|
|
|
 |
|
|
|
## Text-to-video generation |
|
|
|
FreeU can also be used to improve video quality: |
|
|
|
```python |
|
from diffusers import DiffusionPipeline |
|
from diffusers.utils import export_to_video |
|
import torch |
|
|
|
model_id = "cerspense/zeroscope_v2_576w" |
|
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "an astronaut riding a horse on mars" |
|
seed = 2023 |
|
|
|
# The values come from |
|
# https://github.com/lyn-rgb/FreeU_Diffusers#video-pipelines |
|
pipe.enable_freeu(b1=1.2, b2=1.4, s1=0.9, s2=0.2) |
|
video_frames = pipe(prompt, height=320, width=576, num_frames=30, generator=torch.manual_seed(seed)).frames[0] |
|
export_to_video(video_frames, "astronaut_rides_horse.mp4") |
|
``` |
|
|
|
Thanks to [kadirnar](https://github.com/kadirnar/) for helping to integrate the feature, and to [justindujardin](https://github.com/justindujardin) for the helpful discussions. |
|
|