NadaGh's picture
End of training
dde5d93 verified
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ์–ด๋Œ‘ํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
[[open-in-colab]]
ํŠน์ • ๋ฌผ์ฒด์˜ ์ด๋ฏธ์ง€ ๋˜๋Š” ํŠน์ • ์Šคํƒ€์ผ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก diffusion ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ [ํ•™์Šต](../training/overview) ๊ธฐ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์–ด๋Œ‘ํ„ฐ๋Š” ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, ๋‹ค๋ฅธ ์–ด๋Œ‘ํ„ฐ๋Š” ์ž„๋ฒ ๋”ฉ ๋˜๋Š” ๊ฐ€์ค‘์น˜์˜ ์ž‘์€ ๋ถ€๋ถ„๋งŒ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ ์–ด๋Œ‘ํ„ฐ์˜ ๋กœ๋”ฉ ํ”„๋กœ์„ธ์Šค๋„ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” DreamBooth, textual inversion ๋ฐ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
<Tip>
์‚ฌ์šฉํ•  ์ฒดํฌํฌ์ธํŠธ์™€ ์ž„๋ฒ ๋”ฉ์€ [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer), [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer), [Diffusers Models Gallery](https://huggingface.co/spaces/huggingface-projects/diffusers-gallery)์—์„œ ์ฐพ์•„๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
</Tip>
## DreamBooth
[DreamBooth](https://dreambooth.github.io/)๋Š” ๋ฌผ์ฒด์˜ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ *diffusion ๋ชจ๋ธ ์ „์ฒด*๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์Šคํƒ€์ผ๊ณผ ์„ค์ •์œผ๋กœ ํ•ด๋‹น ๋ฌผ์ฒด์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ๋ฌผ์ฒด ์ด๋ฏธ์ง€์™€ ์—ฐ๊ด€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ํ•™์Šต ๋ฐฉ๋ฒ• ์ค‘์—์„œ ๋“œ๋ฆผ๋ถ€์Šค๋Š” ์ „์ฒด ์ฒดํฌํฌ์ธํŠธ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ์ผ ํฌ๊ธฐ๊ฐ€ ๊ฐ€์žฅ ํฝ๋‹ˆ๋‹ค(๋ณดํ†ต ๋ช‡ GB).
Hergรฉ๊ฐ€ ๊ทธ๋ฆฐ ๋‹จ 10๊ฐœ์˜ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋œ [herge_style](https://huggingface.co/sd-dreambooth-library/herge-style) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์™€ ํ•ด๋‹น ์Šคํƒ€์ผ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์ด ์ž‘๋™ํ•˜๋ ค๋ฉด ์ฒดํฌํฌ์ธํŠธ๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด `herge_style`์„ ํฌํ•จ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("sd-dreambooth-library/herge-style", torch_dtype=torch.float16).to("cuda")
prompt = "A cute herge_style brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_dreambooth.png" />
</div>
## Textual inversion
[Textual inversion](https://textual-inversion.github.io/)์€ DreamBooth์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋ฉฐ ๋ช‡ ๊ฐœ์˜ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ํŠน์ • ๊ฐœ๋…(์Šคํƒ€์ผ, ๊ฐœ์ฒด)์„ ์ƒ์„ฑํ•˜๋Š” diffusion ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํ”„๋กฌํ”„ํŠธ์— ํŠน์ • ๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ƒˆ๋กœ์šด ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šตํ•˜๊ณ  ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ diffusion ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€๋˜๊ณ  ํ›ˆ๋ จ ํ”„๋กœ์„ธ์Šค๋Š” ๋น„๊ต์  ์ž‘์€(์ˆ˜ KB) ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
Textual inversion์€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— DreamBooth์ฒ˜๋Ÿผ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฉฐ ๋˜ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
```
์ด์ œ [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ textual inversion ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [sd-concepts-library/gta5-artwork](https://huggingface.co/sd-concepts-library/gta5-artwork) ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๋ ค๋ฉด ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด `<gta5-artwork>`๋ฅผ ํฌํ•จ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork")
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style"
image = pipeline(prompt).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_txt_embed.png" />
</div>
Textual inversion์€ ๋˜ํ•œ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์‚ฌ๋ฌผ์— ๋Œ€ํ•ด *๋„ค๊ฑฐํ‹ฐ๋ธŒ ์ž„๋ฒ ๋”ฉ*์„ ์ƒ์„ฑํ•˜์—ฌ ๋ชจ๋ธ์ด ํ๋ฆฟํ•œ ์ด๋ฏธ์ง€๋‚˜ ์†์˜ ์ถ”๊ฐ€ ์†๊ฐ€๋ฝ๊ณผ ๊ฐ™์€ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์‚ฌ๋ฌผ์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์ด ์‰ฌ์šด ๋ฐฉ๋ฒ•์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „๊ณผ ๊ฐ™์ด ์ž„๋ฒ ๋”ฉ์„ [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`]์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค์ง€๋งŒ ์ด๋ฒˆ์—๋Š” ๋‘ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋” ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:
- `weight_name`: ํŒŒ์ผ์ด ํŠน์ • ์ด๋ฆ„์˜ ๐Ÿค— Diffusers ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ๊ฒฝ์šฐ์ด๊ฑฐ๋‚˜ ํŒŒ์ผ์ด A1111 ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ๊ฒฝ์šฐ, ๋ถˆ๋Ÿฌ์˜ฌ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
- `token`: ์ž„๋ฒ ๋”ฉ์„ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กฌํ”„ํŠธ์—์„œ ์‚ฌ์šฉํ•  ํŠน์ˆ˜ ๋‹จ์–ด๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
[sayakpaul/EasyNegative-test](https://huggingface.co/sayakpaul/EasyNegative-test) ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:
```py
pipeline.load_textual_inversion(
"sayakpaul/EasyNegative-test", weight_name="EasyNegative.safetensors", token="EasyNegative"
)
```
์ด์ œ `token`์„ ์‚ฌ์šฉํ•ด ๋„ค๊ฑฐํ‹ฐ๋ธŒ ์ž„๋ฒ ๋”ฉ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, EasyNegative"
negative_prompt = "EasyNegative"
image = pipeline(prompt, negative_prompt=negative_prompt, num_inference_steps=50).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png" />
</div>
## LoRA
[Low-Rank Adaptation (LoRA)](https://huggingface.co/papers/2106.09685)์€ ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ  ํŒŒ์ผ ํฌ๊ธฐ๊ฐ€ (์ˆ˜๋ฐฑ MB๋กœ) ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ•™์Šต ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, LoRA๋Š” ๋ช‡ ์žฅ์˜ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ์ƒˆ๋กœ์šด ์Šคํƒ€์ผ์„ ํ•™์Šตํ•˜๋„๋ก ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” diffusion ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฝ์ž…ํ•œ ๋‹ค์Œ ์ „์ฒด ๋ชจ๋ธ ๋Œ€์‹  ์ƒˆ๋กœ์šด ๊ฐ€์ค‘์น˜๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ LoRA๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ํ•™์Šต์‹œํ‚ค๊ณ  ๋” ์‰ฝ๊ฒŒ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
<Tip>
LoRA๋Š” ๋‹ค๋ฅธ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ์ผ๋ฐ˜์ ์ธ ํ•™์Šต ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, DreamBooth์™€ LoRA๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์ƒˆ๋กญ๊ณ  ๊ณ ์œ ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ณ‘ํ•ฉํ•˜๋Š” ๊ฒƒ์ด ์ ์  ๋” ์ผ๋ฐ˜ํ™”๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ‘ํ•ฉ์€ ์ด ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๊ฐ€์ด๋“œ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฏ€๋กœ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์‹ฌ์ธต์ ์ธ [LoRA ๋ณ‘ํ•ฉ](merge_loras) ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
</Tip>
LoRA๋Š” ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
```
๊ทธ๋ฆฌ๊ณ  [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora) ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ๋ช…์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:
```py
pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora", weight_name="cereal_box_sdxl_v1.safetensors")
prompt = "bears, pizza bites"
image = pipeline(prompt).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_lora.png" />
</div>
[`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ”์„œ๋“œ๋Š” LoRA ๊ฐ€์ค‘์น˜๋ฅผ UNet๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋ชจ๋‘ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์ด ๋ฉ”์„œ๋“œ๋Š” ํ•ด๋‹น ์ผ€์ด์Šค์—์„œ LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐ ์„ ํ˜ธ๋˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค:
- LoRA ๊ฐ€์ค‘์น˜์— UNet ๋ฐ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์‹๋ณ„์ž๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ
- LoRA ๊ฐ€์ค‘์น˜์— UNet๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์‹๋ณ„์ž๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ
ํ•˜์ง€๋งŒ LoRA ๊ฐ€์ค‘์น˜๋งŒ UNet์— ๋กœ๋“œํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [jbilcke-hf/sdxl-cinematic-1](https://huggingface.co/jbilcke-hf/sdxl-cinematic-1) LoRA๋ฅผ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.unet.load_attn_procs("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors")
# ํ”„๋กฌํ”„ํŠธ์—์„œ cnmt๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" />
</div>
LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์–ธ๋กœ๋“œํ•˜๋ ค๋ฉด [`~loaders.LoraLoaderMixin.unload_lora_weights`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ญ์ œํ•˜๊ณ  ๋ชจ๋ธ์„ ์›๋ž˜ ๊ฐ€์ค‘์น˜๋กœ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค:
```py
pipeline.unload_lora_weights()
```
### LoRA ๊ฐ€์ค‘์น˜ ์Šค์ผ€์ผ ์กฐ์ •ํ•˜๊ธฐ
[`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฐ [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] ๋ชจ๋‘ `cross_attention_kwargs={"scale": 0.5}` ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์–ผ๋งˆ๋‚˜ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ• ์ง€ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ’์ด `0`์ด๋ฉด ๊ธฐ๋ณธ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๊ณ , ๊ฐ’์ด `1`์ด๋ฉด ์™„์ „ํžˆ ๋ฏธ์„ธ ์กฐ์ •๋œ LoRA๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋ ˆ์ด์–ด๋‹น ์‚ฌ์šฉ๋˜๋Š” LoRA ๊ฐ€์ค‘์น˜์˜ ์–‘์„ ๋ณด๋‹ค ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•˜๋ ค๋ฉด [`~loaders.LoraLoaderMixin.set_adapters`]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์–ผ๋งˆ๋งŒํผ ์กฐ์ •ํ• ์ง€ ์ง€์ •ํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
pipe = ... # ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ฑ
pipe.load_lora_weights(..., adapter_name="my_adapter")
scales = {
"text_encoder": 0.5,
"text_encoder_2": 0.5, # ํŒŒ์ดํ”„์— ๋‘ ๋ฒˆ์งธ ํ…์ŠคํŠธ ์ธ์ฝ”๋”๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
"unet": {
"down": 0.9, # down ๋ถ€๋ถ„์˜ ๋ชจ๋“  ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์Šค์ผ€์ผ 0.9๋ฅผ ์‚ฌ์šฉ
# "mid" # ์ด ์˜ˆ์ œ์—์„œ๋Š” "mid"๊ฐ€ ์ง€์ •๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ์ค‘๊ฐ„ ๋ถ€๋ถ„์˜ ๋ชจ๋“  ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๊ธฐ๋ณธ ์Šค์ผ€์ผ 1.0์„ ์‚ฌ์šฉ
"up": {
"block_0": 0.6, # # up์˜ 0๋ฒˆ์งธ ๋ธ”๋ก์— ์žˆ๋Š” 3๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๋ชจ๋‘ ์Šค์ผ€์ผ 0.6์„ ์‚ฌ์šฉ
"block_1": [0.4, 0.8, 1.0], # up์˜ ์ฒซ ๋ฒˆ์งธ ๋ธ”๋ก์— ์žˆ๋Š” 3๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๊ฐ๊ฐ ์Šค์ผ€์ผ 0.4, 0.8, 1.0์„ ์‚ฌ์šฉ
}
}
}
pipe.set_adapters("my_adapter", scales)
```
์ด๋Š” ์—ฌ๋Ÿฌ ์–ด๋Œ‘ํ„ฐ์—์„œ๋„ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฉ๋ฒ•์€ [์ด ๊ฐ€์ด๋“œ](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#customize-adapters-strength)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
<Tip warning={true}>
ํ˜„์žฌ [`~loaders.LoraLoaderMixin.set_adapters`]๋Š” ์–ดํ…์…˜ ๊ฐ€์ค‘์น˜์˜ ์Šค์ผ€์ผ๋ง๋งŒ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. LoRA์— ๋‹ค๋ฅธ ๋ถ€๋ถ„(์˜ˆ: resnets or down-/upsamplers)์ด ์žˆ๋Š” ๊ฒฝ์šฐ 1.0์˜ ์Šค์ผ€์ผ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
</Tip>
### Kohya์™€ TheLastBen
์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ๋‹ค๋ฅธ LoRA trainer๋กœ๋Š” [Kohya](https://github.com/kohya-ss/sd-scripts/)์™€ [TheLastBen](https://github.com/TheLastBen/fast-stable-diffusion)์˜ trainer๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด trainer๋“ค์€ ๐Ÿค— Diffusers๊ฐ€ ํ›ˆ๋ จํ•œ ๊ฒƒ๊ณผ๋Š” ๋‹ค๋ฅธ LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ƒ์„ฑํ•˜์ง€๋งŒ, ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
<hfoptions id="other-trainers">
<hfoption id="Kohya">
Kohya LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด, ์˜ˆ์‹œ๋กœ [Civitai](https://civitai.com/)์—์„œ [Blueprintify SD XL 1.0](https://civitai.com/models/150986/blueprintify-sd-xl-10) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:
```sh
!wget https://civitai.com/api/download/models/168776 -O blueprintify-sd-xl-10.safetensors
```
LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ”์„œ๋“œ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ  `weight_name` ํŒŒ๋ผ๋ฏธํ„ฐ์— ํŒŒ์ผ๋ช…์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/weights", weight_name="blueprintify-sd-xl-10.safetensors")
```
์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```py
# LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด bl3uprint๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์‚ฌ์šฉ
prompt = "bl3uprint, a highly detailed blueprint of the eiffel tower, explaining how to build all parts, many txt, blueprint grid backdrop"
image = pipeline(prompt).images[0]
image
```
<Tip warning={true}>
Kohya LoRA๋ฅผ ๐Ÿค— Diffusers์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ๋•Œ ๋ช‡ ๊ฐ€์ง€ ์ œํ•œ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค:
- [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/pull/4287/#issuecomment-1655110736)์— ์„ค๋ช…๋œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ด์œ ๋กœ ์ธํ•ด ์ด๋ฏธ์ง€๊ฐ€ ComfyUI์™€ ๊ฐ™์€ UI์—์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์™€ ๋‹ค๋ฅด๊ฒŒ ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- [LyCORIS ์ฒดํฌํฌ์ธํŠธ](https://github.com/KohakuBlueleaf/LyCORIS)๊ฐ€ ์™„์ „ํžˆ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. [`~loaders.LoraLoaderMixin.load_lora_weights`] ๋ฉ”์„œ๋“œ๋Š” LoRA ๋ฐ LoCon ๋ชจ๋“ˆ๋กœ LyCORIS ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, Hada ๋ฐ LoKR์€ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
</Tip>
</hfoption>
<hfoption id="TheLastBen">
TheLastBen์—์„œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, [TheLastBen/William_Eggleston_Style_SDXL](https://huggingface.co/TheLastBen/William_Eggleston_Style_SDXL) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋ ค๋ฉด:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("TheLastBen/William_Eggleston_Style_SDXL", weight_name="wegg.safetensors")
# LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด william eggleston๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์‚ฌ์šฉ
prompt = "a house by william eggleston, sunrays, beautiful, sunlight, sunrays, beautiful"
image = pipeline(prompt=prompt).images[0]
image
```
</hfoption>
</hfoptions>
## IP-Adapter
[IP-Adapter](https://ip-adapter.github.io/)๋Š” ๋ชจ๋“  diffusion ๋ชจ๋ธ์— ์ด๋ฏธ์ง€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ๋Ÿ‰ ์–ด๋Œ‘ํ„ฐ์ž…๋‹ˆ๋‹ค. ์ด ์–ด๋Œ‘ํ„ฐ๋Š” ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ feature์˜ cross-attention ๋ ˆ์ด์–ด๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ชจ๋“  ๋ชจ๋ธ ์ปดํฌ๋„ŒํŠธํŠผ freeze๋˜๊ณ  UNet์˜ embedded ์ด๋ฏธ์ง€ features๋งŒ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ IP-Adapter ํŒŒ์ผ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ตœ๋Œ€ 100MB์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค.
๋‹ค์–‘ํ•œ ์ž‘์—…๊ณผ ๊ตฌ์ฒด์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€์— IP-Adapter๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [IP-Adapter](../using-diffusers/ip_adapter) ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
> [!TIP]
> Diffusers๋Š” ํ˜„์žฌ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ผ๋ถ€ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋งŒ IP-Adapter๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋ฉ‹์ง„ ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์žˆ๋Š” ์ง€์›๋˜์ง€ ์•Š๋Š” ํŒŒ์ดํ”„๋ผ์ธ์— IP-Adapter๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๊ธฐ๋Šฅ ์š”์ฒญ์„ ์—ฌ์„ธ์š”!
> ๊ณต์‹ IP-Adapter ์ฒดํฌํฌ์ธํŠธ๋Š” [h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์‹œ์ž‘ํ•˜๋ ค๋ฉด Stable Diffusion ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค์„ธ์š”.
```py
from diffusers import AutoPipelineForText2Image
import torch
from diffusers.utils import load_image
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
```
๊ทธ๋Ÿฐ ๋‹ค์Œ IP-Adapter ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์™€ [`~loaders.IPAdapterMixin.load_ip_adapter`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
```py
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
```
๋ถˆ๋Ÿฌ์˜จ ๋’ค, ์ด๋ฏธ์ง€ ๋ฐ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์žˆ๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ€์ด๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
ย  ย  prompt='best quality, high quality, wearing sunglasses',
ย  ย  ip_adapter_image=image,
ย  ย  negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
ย  ย  num_inference_steps=50,
ย  ย  generator=generator,
).images[0]
images
```
<div class="flex justify-center">
ย  ย  <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip-bear.png" />
</div>
### IP-Adapter Plus
IP-Adapter๋Š” ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ feature๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. IP-Adapter ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— `image_encoder` ํ•˜์œ„ ํด๋”๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๊ฐ€ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์™€ ํŒŒ์ดํ”„๋ผ์ธ์— ๋“ฑ๋ก๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, [`~transformers.CLIPVisionModelWithProjection`] ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ถˆ๋Ÿฌ์™€ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” ViT-H ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜๋Š” *IP-Adapter Plus* ์ฒดํฌํฌ์ธํŠธ์— ํ•ด๋‹นํ•˜๋Š” ์ผ€์ด์Šค์ž…๋‹ˆ๋‹ค.
```py
from transformers import CLIPVisionModelWithProjection
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"h94/IP-Adapter",
subfolder="models/image_encoder",
torch_dtype=torch.float16
)
pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
image_encoder=image_encoder,
torch_dtype=torch.float16
).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter-plus_sdxl_vit-h.safetensors")
```
### IP-Adapter Face ID ๋ชจ๋ธ
IP-Adapter FaceID ๋ชจ๋ธ์€ CLIP ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ ๋Œ€์‹  `insightface`์—์„œ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” ์‹คํ—˜์ ์ธ IP Adapter์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ ์ค‘ ์ผ๋ถ€๋Š” LoRA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ID ์ผ๊ด€์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด `insightface`์™€ ํ•ด๋‹น ์š”๊ตฌ ์‚ฌํ•ญ์„ ๋ชจ๋‘ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
<Tip warning={true}>
InsightFace ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋น„์ƒ์—…์  ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, IP-Adapter-FaceID ๋ชจ๋ธ์€ ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ๋งŒ ๋ฆด๋ฆฌ์ฆˆ๋˜์—ˆ์œผ๋ฉฐ ์ƒ์—…์  ์šฉ๋„๋กœ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
</Tip>
```py
pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid_sdxl.bin", image_encoder_folder=None)
```
๋‘ ๊ฐ€์ง€ IP ์–ด๋Œ‘ํ„ฐ FaceID Plus ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ, ์ด ๋ชจ๋ธ๋“ค์€ ๋” ๋‚˜์€ ์‚ฌ์‹ค๊ฐ์„ ์–ป๊ธฐ ์œ„ํ•ด `insightface`์™€ CLIP ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, CLIP ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋„ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```py
from transformers import CLIPVisionModelWithProjection
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
torch_dtype=torch.float16,
)
pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5",
image_encoder=image_encoder,
torch_dtype=torch.float16
).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid-plus_sd15.bin")
```