pablo
add diffusers fork
a63d2a4
|
raw
history blame
21.7 kB
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Diffusers์—์„œ์˜ PyTorch 2.0 ๊ฐ€์†ํ™” ์ง€์›
`0.13.0` ๋ฒ„์ „๋ถ€ํ„ฐ Diffusers๋Š” [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/)์—์„œ์˜ ์ตœ์‹  ์ตœ์ ํ™”๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ์„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
1. momory-efficient attention์„ ์‚ฌ์šฉํ•œ ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์ง€์› - `xformers`๊ฐ™์€ ์ถ”๊ฐ€์ ์ธ dependencies ํ•„์š” ์—†์Œ
2. ์ถ”๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊ฐœ๋ณ„ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ปดํŒŒ์ผ ๊ธฐ๋Šฅ [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) ์ง€์›
## ์„ค์น˜
๊ฐ€์†ํ™”๋œ ์–ดํ…์…˜ ๊ตฌํ˜„๊ณผ ๋ฐ `torch.compile()`์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด, pip์—์„œ ์ตœ์‹  ๋ฒ„์ „์˜ PyTorch 2.0์„ ์„ค์น˜๋˜์–ด ์žˆ๊ณ  diffusers 0.13.0. ๋ฒ„์ „ ์ด์ƒ์ธ์ง€ ํ™•์ธํ•˜์„ธ์š”. ์•„๋ž˜ ์„ค๋ช…๋œ ๋ฐ”์™€ ๊ฐ™์ด, PyTorch 2.0์ด ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์„ ๋•Œ diffusers๋Š” ์ตœ์ ํ™”๋œ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ([`AttnProcessor2_0`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L798))๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
```bash
pip install --upgrade torch diffusers
```
## ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ์™€ `torch.compile` ์‚ฌ์šฉํ•˜๊ธฐ.
1. **๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ๊ตฌํ˜„**
PyTorch 2.0์—๋Š” [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ตœ์ ํ™”๋œ memory-efficient attention์˜ ๊ตฌํ˜„์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ž…๋ ฅ ๋ฐ GPU ์œ ํ˜•์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ์ตœ์ ํ™”๋ฅผ ์ž๋™์œผ๋กœ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” [xFormers](https://github.com/facebookresearch/xformers)์˜ `memory_efficient_attention`๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ๊ธฐ๋ณธ์ ์œผ๋กœ PyTorch์— ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ์ตœ์ ํ™”๋Š” PyTorch 2.0์ด ์„ค์น˜๋˜์–ด ์žˆ๊ณ  `torch.nn.functional.scaled_dot_product_attention`์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ Diffusers์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด `torch 2.0`์„ ์„ค์น˜ํ•˜๊ณ  ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:
```Python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
์ด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด(ํ•„์ˆ˜๋Š” ์•„๋‹˜) ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```diff
import torch
from diffusers import DiffusionPipeline
+ from diffusers.models.attention_processor import AttnProcessor2_0
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
+ pipe.unet.set_attn_processor(AttnProcessor2_0())
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
์ด ์‹คํ–‰ ๊ณผ์ •์€ `xFormers`๋งŒํผ ๋น ๋ฅด๊ณ  ๋ฉ”๋ชจ๋ฆฌ์ ์œผ๋กœ ํšจ์œจ์ ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ [๋ฒค์น˜๋งˆํฌ](#benchmark)์—์„œ ํ™•์ธํ•˜์„ธ์š”.
ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ณด๋‹ค deterministic์œผ๋กœ ๋งŒ๋“ค๊ฑฐ๋‚˜ ํŒŒ์ธ ํŠœ๋‹๋œ ๋ชจ๋ธ์„ [Core ML](https://huggingface.co/docs/diffusers/v0.16.0/en/optimization/coreml#how-to-run-stable-diffusion-with-core-ml)๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ”๋‹๋ผ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ ([`AttnProcessor`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L402))๋กœ ๋˜๋Œ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด [`~diffusers.UNet2DConditionModel.set_default_attn_processor`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```Python
import torch
from diffusers import DiffusionPipeline
from diffusers.models.attention_processor import AttnProcessor
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipe.unet.set_default_attn_processor()
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
2. **torch.compile**
์ถ”๊ฐ€์ ์ธ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์ƒˆ๋กœ์šด `torch.compile` ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์˜ UNet์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๊ฐ€์žฅ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๋‚˜๋จธ์ง€ ํ•˜์œ„ ๋ชจ๋ธ(ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ VAE)์€ ๊ทธ๋Œ€๋กœ ๋‘๊ณ  `unet`์„ `torch.compile`๋กœ ๋ž˜ํ•‘ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ๊ณผ ๋‹ค๋ฅธ ์˜ต์…˜์€ [torch ์ปดํŒŒ์ผ ๋ฌธ์„œ](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
```python
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images
```
GPU ์œ ํ˜•์— ๋”ฐ๋ผ `compile()`์€ ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด **5% - 300%**์˜ _์ถ”๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ_์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ปดํŒŒ์ผ์€ Ampere(A100, 3090), Ada(4090) ๋ฐ Hopper(H100)์™€ ๊ฐ™์€ ์ตœ์‹  GPU ์•„ํ‚คํ…์ฒ˜์—์„œ ๋” ๋งŽ์€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Œ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.
์ปดํŒŒ์ผ์€ ์™„๋ฃŒํ•˜๋Š” ๋ฐ ์•ฝ๊ฐ„์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋ฏ€๋กœ, ํŒŒ์ดํ”„๋ผ์ธ์„ ํ•œ ๋ฒˆ ์ค€๋น„ํ•œ ๋‹ค์Œ ๋™์ผํ•œ ์œ ํ˜•์˜ ์ถ”๋ก  ์ž‘์—…์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์— ๊ฐ€์žฅ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์—์„œ ์ปดํŒŒ์ผ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ˜ธ์ถœํ•˜๋ฉด ์‹œ๊ฐ„์  ๋น„์šฉ์ด ๋งŽ์ด ๋“ค ์ˆ˜ ์žˆ๋Š” ์ปดํŒŒ์ผ ์ž‘์—…์ด ๋‹ค์‹œ ํŠธ๋ฆฌ๊ฑฐ๋ฉ๋‹ˆ๋‹ค.
## ๋ฒค์น˜๋งˆํฌ
PyTorch 2.0์˜ ํšจ์œจ์ ์ธ ์–ดํ…์…˜ ๊ตฌํ˜„๊ณผ `torch.compile`์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” 5๊ฐœ์˜ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ GPU์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๊ฑธ์ณ ํฌ๊ด„์ ์ธ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” [`torch.compile()`์ด ์ตœ์ ์œผ๋กœ ํ™œ์šฉ๋˜๋„๋ก ํ•˜๋Š”](https://github.com/huggingface/diffusers/pull/3313) `diffusers 0.17.0.dev0`์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
### ๋ฒค์น˜๋งˆํ‚น ์ฝ”๋“œ
#### Stable Diffusion text-to-image
```python
from diffusers import DiffusionPipeline
import torch
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
pipe = DiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
images = pipe(prompt=prompt).images
```
#### Stable Diffusion image-to-image
```python
from diffusers import StableDiffusionImg2ImgPipeline
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image).images[0]
```
#### Stable Diffusion - inpainting
```python
from diffusers import StableDiffusionInpaintPipeline
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
path = "runwayml/stable-diffusion-inpainting"
run_compile = True # Set True / False
pipe = StableDiffusionInpaintPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
```
#### ControlNet
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
path, controlnet=controlnet, torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
pipe.controlnet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.controlnet = torch.compile(pipe.controlnet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image).images[0]
```
#### IF text-to-image + upscaling
```python
from diffusers import DiffusionPipeline
import torch
run_compile = True # Set True / False
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
pipe.to("cuda")
pipe_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
pipe_2.to("cuda")
pipe_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16)
pipe_3.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
pipe_2.unet.to(memory_format=torch.channels_last)
pipe_3.unet.to(memory_format=torch.channels_last)
if run_compile:
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe_2.unet = torch.compile(pipe_2.unet, mode="reduce-overhead", fullgraph=True)
pipe_3.unet = torch.compile(pipe_3.unet, mode="reduce-overhead", fullgraph=True)
prompt = "the blue hulk"
prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
neg_prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
for _ in range(3):
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
image_2 = pipe_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
image_3 = pipe_3(prompt=prompt, image=image, noise_level=100).images
```
PyTorch 2.0 ๋ฐ `torch.compile()`๋กœ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅํ•œ ์†๋„ ํ–ฅ์ƒ์— ๋Œ€ํ•ด, [Stable Diffusion text-to-image pipeline](StableDiffusionPipeline)์— ๋Œ€ํ•œ ์ƒ๋Œ€์ ์ธ ์†๋„ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ฐจํŠธ๋ฅผ 5๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ GPU ์ œํ’ˆ๊ตฐ(๋ฐฐ์น˜ ํฌ๊ธฐ 4)์— ๋Œ€ํ•ด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค:
![t2i_speedup](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/pt2_benchmarks/t2i_speedup.png)
To give you an even better idea of how this speed-up holds for the other pipelines presented above, consider the following
plot that shows the benchmarking numbers from an A100 across three different batch sizes
(with PyTorch 2.0 nightly and `torch.compile()`):
์ด ์†๋„ ํ–ฅ์ƒ์ด ์œ„์— ์ œ์‹œ๋œ ๋‹ค๋ฅธ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋„ ์–ด๋–ป๊ฒŒ ์œ ์ง€๋˜๋Š”์ง€ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ์„ธ ๊ฐ€์ง€์˜ ๋‹ค๋ฅธ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๊ฑธ์ณ A100์˜ ๋ฒค์น˜๋งˆํ‚น(PyTorch 2.0 nightly ๋ฐ `torch.compile() ์‚ฌ์šฉ) ์ˆ˜์น˜๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์ฐจํŠธ๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค:
![a100_numbers](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/pt2_benchmarks/a100_numbers.png)
_(์œ„ ์ฐจํŠธ์˜ ๋ฒค์น˜๋งˆํฌ ๋ฉ”ํŠธ๋ฆญ์€ **์ดˆ๋‹น iteration ์ˆ˜(iterations/second)**์ž…๋‹ˆ๋‹ค)_
๊ทธ๋Ÿฌ๋‚˜ ํˆฌ๋ช…์„ฑ์„ ์œ„ํ•ด ๋ชจ๋“  ๋ฒค์น˜๋งˆํ‚น ์ˆ˜์น˜๋ฅผ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค!
๋‹ค์Œ ํ‘œ๋“ค์—์„œ๋Š”, **_์ดˆ๋‹น ์ฒ˜๋ฆฌ๋˜๋Š” iteration_** ์ˆ˜ ์ธก๋ฉด์—์„œ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
### A100 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 21.66 | 23.13 | 44.03 | 49.74 |
| SD - img2img | 21.81 | 22.40 | 43.92 | 46.32 |
| SD - inpaint | 22.24 | 23.23 | 43.76 | 49.25 |
| SD - controlnet | 15.02 | 15.82 | 32.13 | 36.08 |
| IF | 20.21 / <br>13.84 / <br>24.00 | 20.12 / <br>13.70 / <br>24.03 | โŒ | 97.34 / <br>27.23 / <br>111.66 |
### A100 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 11.6 | 13.12 | 14.62 | 17.27 |
| SD - img2img | 11.47 | 13.06 | 14.66 | 17.25 |
| SD - inpaint | 11.67 | 13.31 | 14.88 | 17.48 |
| SD - controlnet | 8.28 | 9.38 | 10.51 | 12.41 |
| IF | 25.02 | 18.04 | โŒ | 48.47 |
### A100 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 3.04 | 3.6 | 3.83 | 4.68 |
| SD - img2img | 2.98 | 3.58 | 3.83 | 4.67 |
| SD - inpaint | 3.04 | 3.66 | 3.9 | 4.76 |
| SD - controlnet | 2.15 | 2.58 | 2.74 | 3.35 |
| IF | 8.78 | 9.82 | โŒ | 16.77 |
### V100 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 18.99 | 19.14 | 20.95 | 22.17 |
| SD - img2img | 18.56 | 19.18 | 20.95 | 22.11 |
| SD - inpaint | 19.14 | 19.06 | 21.08 | 22.20 |
| SD - controlnet | 13.48 | 13.93 | 15.18 | 15.88 |
| IF | 20.01 / <br>9.08 / <br>23.34 | 19.79 / <br>8.98 / <br>24.10 | โŒ | 55.75 / <br>11.57 / <br>57.67 |
### V100 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 5.96 | 5.89 | 6.83 | 6.86 |
| SD - img2img | 5.90 | 5.91 | 6.81 | 6.82 |
| SD - inpaint | 5.99 | 6.03 | 6.93 | 6.95 |
| SD - controlnet | 4.26 | 4.29 | 4.92 | 4.93 |
| IF | 15.41 | 14.76 | โŒ | 22.95 |
### V100 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.66 | 1.66 | 1.92 | 1.90 |
| SD - img2img | 1.65 | 1.65 | 1.91 | 1.89 |
| SD - inpaint | 1.69 | 1.69 | 1.95 | 1.93 |
| SD - controlnet | 1.19 | 1.19 | OOM after warmup | 1.36 |
| IF | 5.43 | 5.29 | โŒ | 7.06 |
### T4 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 6.9 | 6.95 | 7.3 | 7.56 |
| SD - img2img | 6.84 | 6.99 | 7.04 | 7.55 |
| SD - inpaint | 6.91 | 6.7 | 7.01 | 7.37 |
| SD - controlnet | 4.89 | 4.86 | 5.35 | 5.48 |
| IF | 17.42 / <br>2.47 / <br>18.52 | 16.96 / <br>2.45 / <br>18.69 | โŒ | 24.63 / <br>2.47 / <br>23.39 |
### T4 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.79 | 1.79 | 2.03 | 1.99 |
| SD - img2img | 1.77 | 1.77 | 2.05 | 2.04 |
| SD - inpaint | 1.81 | 1.82 | 2.09 | 2.09 |
| SD - controlnet | 1.34 | 1.27 | 1.47 | 1.46 |
| IF | 5.79 | 5.61 | โŒ | 7.39 |
### T4 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 2.34s | 2.30s | OOM after 2nd iteration | 1.99s |
| SD - img2img | 2.35s | 2.31s | OOM after warmup | 2.00s |
| SD - inpaint | 2.30s | 2.26s | OOM after 2nd iteration | 1.95s |
| SD - controlnet | OOM after 2nd iteration | OOM after 2nd iteration | OOM after warmup | OOM after warmup |
| IF * | 1.44 | 1.44 | โŒ | 1.94 |
### RTX 3090 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 22.56 | 22.84 | 23.84 | 25.69 |
| SD - img2img | 22.25 | 22.61 | 24.1 | 25.83 |
| SD - inpaint | 22.22 | 22.54 | 24.26 | 26.02 |
| SD - controlnet | 16.03 | 16.33 | 17.38 | 18.56 |
| IF | 27.08 / <br>9.07 / <br>31.23 | 26.75 / <br>8.92 / <br>31.47 | โŒ | 68.08 / <br>11.16 / <br>65.29 |
### RTX 3090 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 6.46 | 6.35 | 7.29 | 7.3 |
| SD - img2img | 6.33 | 6.27 | 7.31 | 7.26 |
| SD - inpaint | 6.47 | 6.4 | 7.44 | 7.39 |
| SD - controlnet | 4.59 | 4.54 | 5.27 | 5.26 |
| IF | 16.81 | 16.62 | โŒ | 21.57 |
### RTX 3090 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.7 | 1.69 | 1.93 | 1.91 |
| SD - img2img | 1.68 | 1.67 | 1.93 | 1.9 |
| SD - inpaint | 1.72 | 1.71 | 1.97 | 1.94 |
| SD - controlnet | 1.23 | 1.22 | 1.4 | 1.38 |
| IF | 5.01 | 5.00 | โŒ | 6.33 |
### RTX 4090 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 40.5 | 41.89 | 44.65 | 49.81 |
| SD - img2img | 40.39 | 41.95 | 44.46 | 49.8 |
| SD - inpaint | 40.51 | 41.88 | 44.58 | 49.72 |
| SD - controlnet | 29.27 | 30.29 | 32.26 | 36.03 |
| IF | 69.71 / <br>18.78 / <br>85.49 | 69.13 / <br>18.80 / <br>85.56 | โŒ | 124.60 / <br>26.37 / <br>138.79 |
### RTX 4090 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 12.62 | 12.84 | 15.32 | 15.59 |
| SD - img2img | 12.61 | 12,.79 | 15.35 | 15.66 |
| SD - inpaint | 12.65 | 12.81 | 15.3 | 15.58 |
| SD - controlnet | 9.1 | 9.25 | 11.03 | 11.22 |
| IF | 31.88 | 31.14 | โŒ | 43.92 |
### RTX 4090 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 3.17 | 3.2 | 3.84 | 3.85 |
| SD - img2img | 3.16 | 3.2 | 3.84 | 3.85 |
| SD - inpaint | 3.17 | 3.2 | 3.85 | 3.85 |
| SD - controlnet | 2.23 | 2.3 | 2.7 | 2.75 |
| IF | 9.26 | 9.2 | โŒ | 13.31 |
## ์ฐธ๊ณ 
* Follow [this PR](https://github.com/huggingface/diffusers/pull/3313) for more details on the environment used for conducting the benchmarks.
* For the IF pipeline and batch sizes > 1, we only used a batch size of >1 in the first IF pipeline for text-to-image generation and NOT for upscaling. So, that means the two upscaling pipelines received a batch size of 1.
*Thanks to [Horace He](https://github.com/Chillee) from the PyTorch team for their support in improving our support of `torch.compile()` in Diffusers.*
* ๋ฒค์น˜๋งˆํฌ ์ˆ˜ํ–‰์— ์‚ฌ์šฉ๋œ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [์ด PR](https://github.com/huggingface/diffusers/pull/3313)์„ ์ฐธ์กฐํ•˜์„ธ์š”.
* IF ํŒŒ์ดํ”„๋ผ์ธ์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ > 1์˜ ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ IF ํŒŒ์ดํ”„๋ผ์ธ์—์„œ text-to-image ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ > 1๋งŒ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ์—…์Šค์ผ€์ผ๋ง์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋‘ ๊ฐœ์˜ ์—…์Šค์ผ€์ผ๋ง ํŒŒ์ดํ”„๋ผ์ธ์ด ๋ฐฐ์น˜ ํฌ๊ธฐ 1์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
*Diffusers์—์„œ `torch.compile()` ์ง€์›์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค€ PyTorch ํŒ€์˜ [Horace He](https://github.com/Chillee)์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.*