|
<!--Copyright 2024 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# ์ด๋ํฐ ๋ถ๋ฌ์ค๊ธฐ |
|
|
|
[[open-in-colab]] |
|
|
|
ํน์ ๋ฌผ์ฒด์ ์ด๋ฏธ์ง ๋๋ ํน์ ์คํ์ผ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋๋ก diffusion ๋ชจ๋ธ์ ๊ฐ์ธํํ๊ธฐ ์ํ ๋ช ๊ฐ์ง [ํ์ต](../training/overview) ๊ธฐ๋ฒ์ด ์์ต๋๋ค. ์ด๋ฌํ ํ์ต ๋ฐฉ๋ฒ์ ๊ฐ๊ฐ ๋ค๋ฅธ ์ ํ์ ์ด๋ํฐ๋ฅผ ์์ฑํฉ๋๋ค. ์ผ๋ถ ์ด๋ํฐ๋ ์์ ํ ์๋ก์ด ๋ชจ๋ธ์ ์์ฑํ๋ ๋ฐ๋ฉด, ๋ค๋ฅธ ์ด๋ํฐ๋ ์๋ฒ ๋ฉ ๋๋ ๊ฐ์ค์น์ ์์ ๋ถ๋ถ๋ง ์์ ํฉ๋๋ค. ์ด๋ ๊ฐ ์ด๋ํฐ์ ๋ก๋ฉ ํ๋ก์ธ์ค๋ ๋ค๋ฅด๋ค๋ ๊ฒ์ ์๋ฏธํฉ๋๋ค. |
|
|
|
์ด ๊ฐ์ด๋์์๋ DreamBooth, textual inversion ๋ฐ LoRA ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์ค๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค. |
|
|
|
<Tip> |
|
|
|
์ฌ์ฉํ ์ฒดํฌํฌ์ธํธ์ ์๋ฒ ๋ฉ์ [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer), [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer), [Diffusers Models Gallery](https://huggingface.co/spaces/huggingface-projects/diffusers-gallery)์์ ์ฐพ์๋ณด์๊ธฐ ๋ฐ๋๋๋ค. |
|
|
|
</Tip> |
|
|
|
## DreamBooth |
|
|
|
[DreamBooth](https://dreambooth.github.io/)๋ ๋ฌผ์ฒด์ ์ฌ๋ฌ ์ด๋ฏธ์ง์ ๋ํ *diffusion ๋ชจ๋ธ ์ ์ฒด*๋ฅผ ๋ฏธ์ธ ์กฐ์ ํ์ฌ ์๋ก์ด ์คํ์ผ๊ณผ ์ค์ ์ผ๋ก ํด๋น ๋ฌผ์ฒด์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค. ์ด ๋ฐฉ๋ฒ์ ๋ชจ๋ธ์ด ๋ฌผ์ฒด ์ด๋ฏธ์ง์ ์ฐ๊ด์ํค๋ ๋ฐฉ๋ฒ์ ํ์ตํ๋ ํ๋กฌํํธ์ ํน์ ๋จ์ด๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ์์ผ๋ก ์๋ํฉ๋๋ค. ๋ชจ๋ ํ์ต ๋ฐฉ๋ฒ ์ค์์ ๋๋ฆผ๋ถ์ค๋ ์ ์ฒด ์ฒดํฌํฌ์ธํธ ๋ชจ๋ธ์ด๊ธฐ ๋๋ฌธ์ ํ์ผ ํฌ๊ธฐ๊ฐ ๊ฐ์ฅ ํฝ๋๋ค(๋ณดํต ๋ช GB). |
|
|
|
Hergรฉ๊ฐ ๊ทธ๋ฆฐ ๋จ 10๊ฐ์ ์ด๋ฏธ์ง๋ก ํ์ต๋ [herge_style](https://huggingface.co/sd-dreambooth-library/herge-style) ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ ํด๋น ์คํ์ผ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํด ๋ณด๊ฒ ์ต๋๋ค. ์ด ๋ชจ๋ธ์ด ์๋ํ๋ ค๋ฉด ์ฒดํฌํฌ์ธํธ๋ฅผ ํธ๋ฆฌ๊ฑฐํ๋ ํ๋กฌํํธ์ ํน์ ๋จ์ด `herge_style`์ ํฌํจ์์ผ์ผ ํฉ๋๋ค: |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("sd-dreambooth-library/herge-style", torch_dtype=torch.float16).to("cuda") |
|
prompt = "A cute herge_style brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_dreambooth.png" /> |
|
</div> |
|
|
|
## Textual inversion |
|
|
|
[Textual inversion](https://textual-inversion.github.io/)์ DreamBooth์ ๋งค์ฐ ์ ์ฌํ๋ฉฐ ๋ช ๊ฐ์ ์ด๋ฏธ์ง๋ง์ผ๋ก ํน์ ๊ฐ๋
(์คํ์ผ, ๊ฐ์ฒด)์ ์์ฑํ๋ diffusion ๋ชจ๋ธ์ ๊ฐ์ธํํ ์๋ ์์ต๋๋ค. ์ด ๋ฐฉ๋ฒ์ ํ๋กฌํํธ์ ํน์ ๋จ์ด๋ฅผ ์
๋ ฅํ๋ฉด ํด๋น ์ด๋ฏธ์ง๋ฅผ ๋ํ๋ด๋ ์๋ก์ด ์๋ฒ ๋ฉ์ ํ์ตํ๊ณ ์ฐพ์๋ด๋ ๋ฐฉ์์ผ๋ก ์๋ํฉ๋๋ค. ๊ฒฐ๊ณผ์ ์ผ๋ก diffusion ๋ชจ๋ธ ๊ฐ์ค์น๋ ๋์ผํ๊ฒ ์ ์ง๋๊ณ ํ๋ จ ํ๋ก์ธ์ค๋ ๋น๊ต์ ์์(์ KB) ํ์ผ์ ์์ฑํฉ๋๋ค. |
|
|
|
Textual inversion์ ์๋ฒ ๋ฉ์ ์์ฑํ๊ธฐ ๋๋ฌธ์ DreamBooth์ฒ๋ผ ๋จ๋
์ผ๋ก ์ฌ์ฉํ ์ ์์ผ๋ฉฐ ๋ ๋ค๋ฅธ ๋ชจ๋ธ์ด ํ์ํฉ๋๋ค. |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") |
|
``` |
|
|
|
์ด์ [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ textual inversion ์๋ฒ ๋ฉ์ ๋ถ๋ฌ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค. [sd-concepts-library/gta5-artwork](https://huggingface.co/sd-concepts-library/gta5-artwork) ์๋ฒ ๋ฉ์ ๋ถ๋ฌ์ ๋ณด๊ฒ ์ต๋๋ค. ์ด๋ฅผ ํธ๋ฆฌ๊ฑฐํ๋ ค๋ฉด ํ๋กฌํํธ์ ํน์ ๋จ์ด `<gta5-artwork>`๋ฅผ ํฌํจ์์ผ์ผ ํฉ๋๋ค: |
|
|
|
```py |
|
pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork") |
|
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_txt_embed.png" /> |
|
</div> |
|
|
|
Textual inversion์ ๋ํ ๋ฐ๋์งํ์ง ์์ ์ฌ๋ฌผ์ ๋ํด *๋ค๊ฑฐํฐ๋ธ ์๋ฒ ๋ฉ*์ ์์ฑํ์ฌ ๋ชจ๋ธ์ด ํ๋ฆฟํ ์ด๋ฏธ์ง๋ ์์ ์ถ๊ฐ ์๊ฐ๋ฝ๊ณผ ๊ฐ์ ๋ฐ๋์งํ์ง ์์ ์ฌ๋ฌผ์ด ํฌํจ๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ์ง ๋ชปํ๋๋ก ํ์ตํ ์๋ ์์ต๋๋ค. ์ด๋ ํ๋กฌํํธ๋ฅผ ๋น ๋ฅด๊ฒ ๊ฐ์ ํ๋ ๊ฒ์ด ์ฌ์ด ๋ฐฉ๋ฒ์ด ๋ ์ ์์ต๋๋ค. ์ด๋ ์ด์ ๊ณผ ๊ฐ์ด ์๋ฒ ๋ฉ์ [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`]์ผ๋ก ๋ถ๋ฌ์ค์ง๋ง ์ด๋ฒ์๋ ๋ ๊ฐ์ ๋งค๊ฐ๋ณ์๊ฐ ๋ ํ์ํฉ๋๋ค: |
|
|
|
- `weight_name`: ํ์ผ์ด ํน์ ์ด๋ฆ์ ๐ค Diffusers ํ์์ผ๋ก ์ ์ฅ๋ ๊ฒฝ์ฐ์ด๊ฑฐ๋ ํ์ผ์ด A1111 ํ์์ผ๋ก ์ ์ฅ๋ ๊ฒฝ์ฐ, ๋ถ๋ฌ์ฌ ๊ฐ์ค์น ํ์ผ์ ์ง์ ํฉ๋๋ค. |
|
- `token`: ์๋ฒ ๋ฉ์ ํธ๋ฆฌ๊ฑฐํ๊ธฐ ์ํด ํ๋กฌํํธ์์ ์ฌ์ฉํ ํน์ ๋จ์ด๋ฅผ ์ง์ ํฉ๋๋ค. |
|
|
|
[sayakpaul/EasyNegative-test](https://huggingface.co/sayakpaul/EasyNegative-test) ์๋ฒ ๋ฉ์ ๋ถ๋ฌ์ ๋ณด๊ฒ ์ต๋๋ค: |
|
|
|
```py |
|
pipeline.load_textual_inversion( |
|
"sayakpaul/EasyNegative-test", weight_name="EasyNegative.safetensors", token="EasyNegative" |
|
) |
|
``` |
|
|
|
์ด์ `token`์ ์ฌ์ฉํด ๋ค๊ฑฐํฐ๋ธ ์๋ฒ ๋ฉ์ด ์๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, EasyNegative" |
|
negative_prompt = "EasyNegative" |
|
|
|
image = pipeline(prompt, negative_prompt=negative_prompt, num_inference_steps=50).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png" /> |
|
</div> |
|
|
|
## LoRA |
|
|
|
[Low-Rank Adaptation (LoRA)](https://huggingface.co/papers/2106.09685)์ ์๋๊ฐ ๋น ๋ฅด๊ณ ํ์ผ ํฌ๊ธฐ๊ฐ (์๋ฐฑ MB๋ก) ์๊ธฐ ๋๋ฌธ์ ๋๋ฆฌ ์ฌ์ฉ๋๋ ํ์ต ๊ธฐ๋ฒ์
๋๋ค. ์ด ๊ฐ์ด๋์ ๋ค๋ฅธ ๋ฐฉ๋ฒ๊ณผ ๋ง์ฐฌ๊ฐ์ง๋ก, LoRA๋ ๋ช ์ฅ์ ์ด๋ฏธ์ง๋ง์ผ๋ก ์๋ก์ด ์คํ์ผ์ ํ์ตํ๋๋ก ๋ชจ๋ธ์ ํ์ต์ํฌ ์ ์์ต๋๋ค. ์ด๋ diffusion ๋ชจ๋ธ์ ์๋ก์ด ๊ฐ์ค์น๋ฅผ ์ฝ์
ํ ๋ค์ ์ ์ฒด ๋ชจ๋ธ ๋์ ์๋ก์ด ๊ฐ์ค์น๋ง ํ์ต์ํค๋ ๋ฐฉ์์ผ๋ก ์๋ํฉ๋๋ค. ๋ฐ๋ผ์ LoRA๋ฅผ ๋ ๋น ๋ฅด๊ฒ ํ์ต์ํค๊ณ ๋ ์ฝ๊ฒ ์ ์ฅํ ์ ์์ต๋๋ค. |
|
|
|
<Tip> |
|
|
|
LoRA๋ ๋ค๋ฅธ ํ์ต ๋ฐฉ๋ฒ๊ณผ ํจ๊ป ์ฌ์ฉํ ์ ์๋ ๋งค์ฐ ์ผ๋ฐ์ ์ธ ํ์ต ๊ธฐ๋ฒ์
๋๋ค. ์๋ฅผ ๋ค์ด, DreamBooth์ LoRA๋ก ๋ชจ๋ธ์ ํ์ตํ๋ ๊ฒ์ด ์ผ๋ฐ์ ์
๋๋ค. ๋ํ ์๋กญ๊ณ ๊ณ ์ ํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๊ธฐ ์ํด ์ฌ๋ฌ ๊ฐ์ LoRA๋ฅผ ๋ถ๋ฌ์ค๊ณ ๋ณํฉํ๋ ๊ฒ์ด ์ ์ ๋ ์ผ๋ฐํ๋๊ณ ์์ต๋๋ค. ๋ณํฉ์ ์ด ๋ถ๋ฌ์ค๊ธฐ ๊ฐ์ด๋์ ๋ฒ์๋ฅผ ๋ฒ์ด๋๋ฏ๋ก ์์ธํ ๋ด์ฉ์ ์ฌ์ธต์ ์ธ [LoRA ๋ณํฉ](merge_loras) ๊ฐ์ด๋์์ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
|
</Tip> |
|
|
|
LoRA๋ ๋ค๋ฅธ ๋ชจ๋ธ๊ณผ ํจ๊ป ์ฌ์ฉํด์ผ ํฉ๋๋ค: |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") |
|
``` |
|
|
|
๊ทธ๋ฆฌ๊ณ [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora) ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์ค๊ณ ๋ฆฌํฌ์งํ ๋ฆฌ์์ ๊ฐ์ค์น ํ์ผ๋ช
์ ์ง์ ํฉ๋๋ค: |
|
|
|
```py |
|
pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora", weight_name="cereal_box_sdxl_v1.safetensors") |
|
prompt = "bears, pizza bites" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_lora.png" /> |
|
</div> |
|
|
|
[`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ์๋๋ LoRA ๊ฐ์ค์น๋ฅผ UNet๊ณผ ํ
์คํธ ์ธ์ฝ๋์ ๋ชจ๋ ๋ถ๋ฌ์ต๋๋ค. ์ด ๋ฉ์๋๋ ํด๋น ์ผ์ด์ค์์ LoRA๋ฅผ ๋ถ๋ฌ์ค๋ ๋ฐ ์ ํธ๋๋ ๋ฐฉ์์
๋๋ค: |
|
|
|
- LoRA ๊ฐ์ค์น์ UNet ๋ฐ ํ
์คํธ ์ธ์ฝ๋์ ๋ํ ๋ณ๋์ ์๋ณ์๊ฐ ์๋ ๊ฒฝ์ฐ |
|
- LoRA ๊ฐ์ค์น์ UNet๊ณผ ํ
์คํธ ์ธ์ฝ๋์ ๋ํ ๋ณ๋์ ์๋ณ์๊ฐ ์๋ ๊ฒฝ์ฐ |
|
|
|
ํ์ง๋ง LoRA ๊ฐ์ค์น๋ง UNet์ ๋ก๋ํด์ผ ํ๋ ๊ฒฝ์ฐ์๋ [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค. [jbilcke-hf/sdxl-cinematic-1](https://huggingface.co/jbilcke-hf/sdxl-cinematic-1) LoRA๋ฅผ ๋ถ๋ฌ์ ๋ณด๊ฒ ์ต๋๋ค: |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") |
|
pipeline.unet.load_attn_procs("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors") |
|
|
|
# ํ๋กฌํํธ์์ cnmt๋ฅผ ์ฌ์ฉํ์ฌ LoRA๋ฅผ ํธ๋ฆฌ๊ฑฐํฉ๋๋ค. |
|
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" /> |
|
</div> |
|
|
|
LoRA ๊ฐ์ค์น๋ฅผ ์ธ๋ก๋ํ๋ ค๋ฉด [`~loaders.LoraLoaderMixin.unload_lora_weights`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ LoRA ๊ฐ์ค์น๋ฅผ ์ญ์ ํ๊ณ ๋ชจ๋ธ์ ์๋ ๊ฐ์ค์น๋ก ๋ณต์ํฉ๋๋ค: |
|
|
|
```py |
|
pipeline.unload_lora_weights() |
|
``` |
|
|
|
### LoRA ๊ฐ์ค์น ์ค์ผ์ผ ์กฐ์ ํ๊ธฐ |
|
|
|
[`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฐ [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] ๋ชจ๋ `cross_attention_kwargs={"scale": 0.5}` ํ๋ผ๋ฏธํฐ๋ฅผ ์ ๋ฌํ์ฌ ์ผ๋ง๋ LoRA ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ ์ง ์กฐ์ ํ ์ ์์ต๋๋ค. ๊ฐ์ด `0`์ด๋ฉด ๊ธฐ๋ณธ ๋ชจ๋ธ ๊ฐ์ค์น๋ง ์ฌ์ฉํ๋ ๊ฒ๊ณผ ๊ฐ๊ณ , ๊ฐ์ด `1`์ด๋ฉด ์์ ํ ๋ฏธ์ธ ์กฐ์ ๋ LoRA๋ฅผ ์ฌ์ฉํ๋ ๊ฒ๊ณผ ๊ฐ์ต๋๋ค. |
|
|
|
๋ ์ด์ด๋น ์ฌ์ฉ๋๋ LoRA ๊ฐ์ค์น์ ์์ ๋ณด๋ค ์ธ๋ฐํ๊ฒ ์ ์ดํ๋ ค๋ฉด [`~loaders.LoraLoaderMixin.set_adapters`]๋ฅผ ์ฌ์ฉํ์ฌ ๊ฐ ๋ ์ด์ด์ ๊ฐ์ค์น๋ฅผ ์ผ๋ง๋งํผ ์กฐ์ ํ ์ง ์ง์ ํ๋ ๋์
๋๋ฆฌ๋ฅผ ์ ๋ฌํ ์ ์์ต๋๋ค. |
|
```python |
|
pipe = ... # ํ์ดํ๋ผ์ธ ์์ฑ |
|
pipe.load_lora_weights(..., adapter_name="my_adapter") |
|
scales = { |
|
"text_encoder": 0.5, |
|
"text_encoder_2": 0.5, # ํ์ดํ์ ๋ ๋ฒ์งธ ํ
์คํธ ์ธ์ฝ๋๊ฐ ์๋ ๊ฒฝ์ฐ์๋ง ์ฌ์ฉ ๊ฐ๋ฅ |
|
"unet": { |
|
"down": 0.9, # down ๋ถ๋ถ์ ๋ชจ๋ ํธ๋์คํฌ๋จธ๋ ์ค์ผ์ผ 0.9๋ฅผ ์ฌ์ฉ |
|
# "mid" # ์ด ์์ ์์๋ "mid"๊ฐ ์ง์ ๋์ง ์์์ผ๋ฏ๋ก ์ค๊ฐ ๋ถ๋ถ์ ๋ชจ๋ ํธ๋์คํฌ๋จธ๋ ๊ธฐ๋ณธ ์ค์ผ์ผ 1.0์ ์ฌ์ฉ |
|
"up": { |
|
"block_0": 0.6, # # up์ 0๋ฒ์งธ ๋ธ๋ก์ ์๋ 3๊ฐ์ ํธ๋์คํฌ๋จธ๋ ๋ชจ๋ ์ค์ผ์ผ 0.6์ ์ฌ์ฉ |
|
"block_1": [0.4, 0.8, 1.0], # up์ ์ฒซ ๋ฒ์งธ ๋ธ๋ก์ ์๋ 3๊ฐ์ ํธ๋์คํฌ๋จธ๋ ๊ฐ๊ฐ ์ค์ผ์ผ 0.4, 0.8, 1.0์ ์ฌ์ฉ |
|
} |
|
} |
|
} |
|
pipe.set_adapters("my_adapter", scales) |
|
``` |
|
|
|
์ด๋ ์ฌ๋ฌ ์ด๋ํฐ์์๋ ์๋ํฉ๋๋ค. ๋ฐฉ๋ฒ์ [์ด ๊ฐ์ด๋](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#customize-adapters-strength)๋ฅผ ์ฐธ์กฐํ์ธ์. |
|
|
|
<Tip warning={true}> |
|
|
|
ํ์ฌ [`~loaders.LoraLoaderMixin.set_adapters`]๋ ์ดํ
์
๊ฐ์ค์น์ ์ค์ผ์ผ๋ง๋ง ์ง์ํฉ๋๋ค. LoRA์ ๋ค๋ฅธ ๋ถ๋ถ(์: resnets or down-/upsamplers)์ด ์๋ ๊ฒฝ์ฐ 1.0์ ์ค์ผ์ผ์ ์ ์งํฉ๋๋ค. |
|
|
|
</Tip> |
|
|
|
### Kohya์ TheLastBen |
|
|
|
์ปค๋ฎค๋ํฐ์์ ์ธ๊ธฐ ์๋ ๋ค๋ฅธ LoRA trainer๋ก๋ [Kohya](https://github.com/kohya-ss/sd-scripts/)์ [TheLastBen](https://github.com/TheLastBen/fast-stable-diffusion)์ trainer๊ฐ ์์ต๋๋ค. ์ด trainer๋ค์ ๐ค Diffusers๊ฐ ํ๋ จํ ๊ฒ๊ณผ๋ ๋ค๋ฅธ LoRA ์ฒดํฌํฌ์ธํธ๋ฅผ ์์ฑํ์ง๋ง, ๊ฐ์ ๋ฐฉ์์ผ๋ก ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค. |
|
|
|
<hfoptions id="other-trainers"> |
|
<hfoption id="Kohya"> |
|
|
|
Kohya LoRA๋ฅผ ๋ถ๋ฌ์ค๊ธฐ ์ํด, ์์๋ก [Civitai](https://civitai.com/)์์ [Blueprintify SD XL 1.0](https://civitai.com/models/150986/blueprintify-sd-xl-10) ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ค์ด๋ก๋ํฉ๋๋ค: |
|
|
|
```sh |
|
!wget https://civitai.com/api/download/models/168776 -O blueprintify-sd-xl-10.safetensors |
|
``` |
|
|
|
LoRA ์ฒดํฌํฌ์ธํธ๋ฅผ [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ์๋๋ก ๋ถ๋ฌ์ค๊ณ `weight_name` ํ๋ผ๋ฏธํฐ์ ํ์ผ๋ช
์ ์ง์ ํฉ๋๋ค: |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") |
|
pipeline.load_lora_weights("path/to/weights", weight_name="blueprintify-sd-xl-10.safetensors") |
|
``` |
|
|
|
์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
# LoRA๋ฅผ ํธ๋ฆฌ๊ฑฐํ๊ธฐ ์ํด bl3uprint๋ฅผ ํ๋กฌํํธ์ ์ฌ์ฉ |
|
prompt = "bl3uprint, a highly detailed blueprint of the eiffel tower, explaining how to build all parts, many txt, blueprint grid backdrop" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
<Tip warning={true}> |
|
|
|
Kohya LoRA๋ฅผ ๐ค Diffusers์ ํจ๊ป ์ฌ์ฉํ ๋ ๋ช ๊ฐ์ง ์ ํ ์ฌํญ์ด ์์ต๋๋ค: |
|
|
|
- [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/pull/4287/#issuecomment-1655110736)์ ์ค๋ช
๋ ์ฌ๋ฌ ๊ฐ์ง ์ด์ ๋ก ์ธํด ์ด๋ฏธ์ง๊ฐ ComfyUI์ ๊ฐ์ UI์์ ์์ฑ๋ ์ด๋ฏธ์ง์ ๋ค๋ฅด๊ฒ ๋ณด์ผ ์ ์์ต๋๋ค. |
|
- [LyCORIS ์ฒดํฌํฌ์ธํธ](https://github.com/KohakuBlueleaf/LyCORIS)๊ฐ ์์ ํ ์ง์๋์ง ์์ต๋๋ค. [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ์๋๋ LoRA ๋ฐ LoCon ๋ชจ๋๋ก LyCORIS ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ฌ ์ ์์ง๋ง, Hada ๋ฐ LoKR์ ์ง์๋์ง ์์ต๋๋ค. |
|
|
|
</Tip> |
|
|
|
</hfoption> |
|
<hfoption id="TheLastBen"> |
|
|
|
TheLastBen์์ ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ค๋ ๋ฐฉ๋ฒ์ ๋งค์ฐ ์ ์ฌํฉ๋๋ค. ์๋ฅผ ๋ค์ด, [TheLastBen/William_Eggleston_Style_SDXL](https://huggingface.co/TheLastBen/William_Eggleston_Style_SDXL) ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ค๋ ค๋ฉด: |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") |
|
pipeline.load_lora_weights("TheLastBen/William_Eggleston_Style_SDXL", weight_name="wegg.safetensors") |
|
|
|
# LoRA๋ฅผ ํธ๋ฆฌ๊ฑฐํ๊ธฐ ์ํด william eggleston๋ฅผ ํ๋กฌํํธ์ ์ฌ์ฉ |
|
prompt = "a house by william eggleston, sunrays, beautiful, sunlight, sunrays, beautiful" |
|
image = pipeline(prompt=prompt).images[0] |
|
image |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## IP-Adapter |
|
|
|
[IP-Adapter](https://ip-adapter.github.io/)๋ ๋ชจ๋ diffusion ๋ชจ๋ธ์ ์ด๋ฏธ์ง ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ ์ ์๋ ๊ฒฝ๋ ์ด๋ํฐ์
๋๋ค. ์ด ์ด๋ํฐ๋ ์ด๋ฏธ์ง์ ํ
์คํธ feature์ cross-attention ๋ ์ด์ด๋ฅผ ๋ถ๋ฆฌํ์ฌ ์๋ํฉ๋๋ค. ๋ค๋ฅธ ๋ชจ๋ ๋ชจ๋ธ ์ปดํฌ๋ํธํผ freeze๋๊ณ UNet์ embedded ์ด๋ฏธ์ง features๋ง ํ์ต๋ฉ๋๋ค. ๋ฐ๋ผ์ IP-Adapter ํ์ผ์ ์ผ๋ฐ์ ์ผ๋ก ์ต๋ 100MB์ ๋ถ๊ณผํฉ๋๋ค. |
|
|
|
๋ค์ํ ์์
๊ณผ ๊ตฌ์ฒด์ ์ธ ์ฌ์ฉ ์ฌ๋ก์ IP-Adapter๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋ํ ์์ธํ ๋ด์ฉ์ [IP-Adapter](../using-diffusers/ip_adapter) ๊ฐ์ด๋์์ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
|
> [!TIP] |
|
> Diffusers๋ ํ์ฌ ๊ฐ์ฅ ๋ง์ด ์ฌ์ฉ๋๋ ์ผ๋ถ ํ์ดํ๋ผ์ธ์ ๋ํด์๋ง IP-Adapter๋ฅผ ์ง์ํฉ๋๋ค. ๋ฉ์ง ์ฌ์ฉ ์ฌ๋ก๊ฐ ์๋ ์ง์๋์ง ์๋ ํ์ดํ๋ผ์ธ์ IP-Adapter๋ฅผ ํตํฉํ๊ณ ์ถ๋ค๋ฉด ์ธ์ ๋ ์ง ๊ธฐ๋ฅ ์์ฒญ์ ์ฌ์ธ์! |
|
> ๊ณต์ IP-Adapter ์ฒดํฌํฌ์ธํธ๋ [h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)์์ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
|
์์ํ๋ ค๋ฉด Stable Diffusion ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ค์ธ์. |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
from diffusers.utils import load_image |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") |
|
``` |
|
|
|
๊ทธ๋ฐ ๋ค์ IP-Adapter ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์ [`~loaders.IPAdapterMixin.load_ip_adapter`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ํ์ดํ๋ผ์ธ์ ์ถ๊ฐํฉ๋๋ค. |
|
|
|
```py |
|
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin") |
|
``` |
|
|
|
๋ถ๋ฌ์จ ๋ค, ์ด๋ฏธ์ง ๋ฐ ํ
์คํธ ํ๋กฌํํธ๊ฐ ์๋ ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง ์์ฑ ํ๋ก์ธ์ค๋ฅผ ๊ฐ์ด๋ํ ์ ์์ต๋๋ค. |
|
|
|
```py |
|
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png") |
|
generator = torch.Generator(device="cpu").manual_seed(33) |
|
images = pipeline( |
|
ย ย prompt='best quality, high quality, wearing sunglasses', |
|
ย ย ip_adapter_image=image, |
|
ย ย negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", |
|
ย ย num_inference_steps=50, |
|
ย ย generator=generator, |
|
).images[0] |
|
images |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
ย ย <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip-bear.png" /> |
|
</div> |
|
|
|
### IP-Adapter Plus |
|
|
|
IP-Adapter๋ ์ด๋ฏธ์ง ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง feature๋ฅผ ์์ฑํฉ๋๋ค. IP-Adapter ๋ฆฌํฌ์งํ ๋ฆฌ์ `image_encoder` ํ์ ํด๋๊ฐ ์๋ ๊ฒฝ์ฐ, ์ด๋ฏธ์ง ์ธ์ฝ๋๊ฐ ์๋์ผ๋ก ๋ถ๋ฌ์ ํ์ดํ๋ผ์ธ์ ๋ฑ๋ก๋ฉ๋๋ค. ๊ทธ๋ ์ง ์์ ๊ฒฝ์ฐ, [`~transformers.CLIPVisionModelWithProjection`] ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง ์ธ์ฝ๋๋ฅผ ๋ช
์์ ์ผ๋ก ๋ถ๋ฌ์ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํด์ผ ํฉ๋๋ค. |
|
|
|
์ด๋ ViT-H ์ด๋ฏธ์ง ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ๋ *IP-Adapter Plus* ์ฒดํฌํฌ์ธํธ์ ํด๋นํ๋ ์ผ์ด์ค์
๋๋ค. |
|
|
|
```py |
|
from transformers import CLIPVisionModelWithProjection |
|
|
|
image_encoder = CLIPVisionModelWithProjection.from_pretrained( |
|
"h94/IP-Adapter", |
|
subfolder="models/image_encoder", |
|
torch_dtype=torch.float16 |
|
) |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained( |
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
image_encoder=image_encoder, |
|
torch_dtype=torch.float16 |
|
).to("cuda") |
|
|
|
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter-plus_sdxl_vit-h.safetensors") |
|
``` |
|
|
|
### IP-Adapter Face ID ๋ชจ๋ธ |
|
|
|
IP-Adapter FaceID ๋ชจ๋ธ์ CLIP ์ด๋ฏธ์ง ์๋ฒ ๋ฉ ๋์ `insightface`์์ ์์ฑํ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ์ฌ์ฉํ๋ ์คํ์ ์ธ IP Adapter์
๋๋ค. ์ด๋ฌํ ๋ชจ๋ธ ์ค ์ผ๋ถ๋ LoRA๋ฅผ ์ฌ์ฉํ์ฌ ID ์ผ๊ด์ฑ์ ๊ฐ์ ํ๊ธฐ๋ ํฉ๋๋ค. |
|
์ด๋ฌํ ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ค๋ฉด `insightface`์ ํด๋น ์๊ตฌ ์ฌํญ์ ๋ชจ๋ ์ค์นํด์ผ ํฉ๋๋ค. |
|
|
|
<Tip warning={true}> |
|
InsightFace ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ ๋น์์
์ ์ฐ๊ตฌ ๋ชฉ์ ์ผ๋ก๋ง ์ฌ์ฉํ ์ ์์ผ๋ฏ๋ก, IP-Adapter-FaceID ๋ชจ๋ธ์ ์ฐ๊ตฌ ๋ชฉ์ ์ผ๋ก๋ง ๋ฆด๋ฆฌ์ฆ๋์์ผ๋ฉฐ ์์
์ ์ฉ๋๋ก๋ ์ฌ์ฉํ ์ ์์ต๋๋ค. |
|
</Tip> |
|
|
|
```py |
|
pipeline = AutoPipelineForText2Image.from_pretrained( |
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
torch_dtype=torch.float16 |
|
).to("cuda") |
|
|
|
pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid_sdxl.bin", image_encoder_folder=None) |
|
``` |
|
|
|
๋ ๊ฐ์ง IP ์ด๋ํฐ FaceID Plus ๋ชจ๋ธ ์ค ํ๋๋ฅผ ์ฌ์ฉํ๋ ค๋ ๊ฒฝ์ฐ, ์ด ๋ชจ๋ธ๋ค์ ๋ ๋์ ์ฌ์ค๊ฐ์ ์ป๊ธฐ ์ํด `insightface`์ CLIP ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ๋ชจ๋ ์ฌ์ฉํ๋ฏ๋ก, CLIP ์ด๋ฏธ์ง ์ธ์ฝ๋๋ ๋ถ๋ฌ์์ผ ํฉ๋๋ค. |
|
|
|
```py |
|
from transformers import CLIPVisionModelWithProjection |
|
|
|
image_encoder = CLIPVisionModelWithProjection.from_pretrained( |
|
"laion/CLIP-ViT-H-14-laion2B-s32B-b79K", |
|
torch_dtype=torch.float16, |
|
) |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained( |
|
"runwayml/stable-diffusion-v1-5", |
|
image_encoder=image_encoder, |
|
torch_dtype=torch.float16 |
|
).to("cuda") |
|
|
|
pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid-plus_sd15.bin") |
|
``` |
|
|