JiantaoLin
new
20bf0a0
|
raw
history blame
20.3 kB
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# DreamBooth
[DreamBooth](https://arxiv.org/abs/2208.12242)λŠ” ν•œ μ£Όμ œμ— λŒ€ν•œ 적은 이미지(3~5개)λ§ŒμœΌλ‘œλ„ stable diffusionκ³Ό 같이 text-to-image λͺ¨λΈμ„ κ°œμΈν™”ν•  수 μžˆλŠ” λ°©λ²•μž…λ‹ˆλ‹€. 이λ₯Ό 톡해 λͺ¨λΈμ€ λ‹€μ–‘ν•œ μž₯λ©΄, 포즈 및 μž₯λ©΄(λ·°)μ—μ„œ 피사체에 λŒ€ν•΄ λ§₯락화(contextualized)된 이미지λ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€.
![ν”„λ‘œμ νŠΈ λΈ”λ‘œκ·Έμ—μ„œμ˜ DreamBooth μ˜ˆμ‹œ](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg)
<small>μ—μ„œμ˜ Dreambooth μ˜ˆμ‹œ <a href="https://dreambooth.github.io">project's blog.</a></small>
이 κ°€μ΄λ“œλŠ” λ‹€μ–‘ν•œ GPU, Flax 사양에 λŒ€ν•΄ [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) λͺ¨λΈλ‘œ DreamBoothλ₯Ό νŒŒμΈνŠœλ‹ν•˜λŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€. 더 깊이 νŒŒκ³ λ“€μ–΄ μž‘λ™ 방식을 ν™•μΈν•˜λŠ” 데 관심이 μžˆλŠ” 경우, 이 κ°€μ΄λ“œμ— μ‚¬μš©λœ DreamBooth의 λͺ¨λ“  ν•™μŠ΅ 슀크립트λ₯Ό [μ—¬κΈ°](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)μ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.
슀크립트λ₯Ό μ‹€ν–‰ν•˜κΈ° 전에 라이브러리의 ν•™μŠ΅μ— ν•„μš”ν•œ dependenciesλ₯Ό μ„€μΉ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ˜ν•œ `main` GitHub λΈŒλžœμΉ˜μ—μ„œ 🧨 Diffusersλ₯Ό μ„€μΉ˜ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
```bash
pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt
```
xFormersλŠ” ν•™μŠ΅μ— ν•„μš”ν•œ μš”κ΅¬ 사항은 μ•„λ‹ˆμ§€λ§Œ, κ°€λŠ₯ν•˜λ©΄ [μ„€μΉ˜](../optimization/xformers)ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€. ν•™μŠ΅ 속도λ₯Ό 높이고 λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ„ 쀄일 수 있기 λ•Œλ¬Έμž…λ‹ˆλ‹€.
λͺ¨λ“  dependencies을 μ„€μ •ν•œ ν›„ λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ [πŸ€— Accelerate](https://github.com/huggingface/accelerate/) ν™˜κ²½μ„ λ‹€μŒκ³Ό 같이 μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€:
```bash
accelerate config
```
별도 μ„€μ • 없이 κΈ°λ³Έ πŸ€— Accelerate ν™˜κ²½μ„ μ„€μΉ˜ν•˜λ €λ©΄ λ‹€μŒμ„ μ‹€ν–‰ν•©λ‹ˆλ‹€:
```bash
accelerate config default
```
λ˜λŠ” ν˜„μž¬ ν™˜κ²½μ΄ λ…ΈνŠΈλΆκ³Ό 같은 λŒ€ν™”ν˜• 셸을 μ§€μ›ν•˜μ§€ μ•ŠλŠ” 경우 λ‹€μŒμ„ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
```py
from accelerate.utils import write_basic_config
write_basic_config()
```
## νŒŒμΈνŠœλ‹
<Tip warning={true}>
DreamBooth νŒŒμΈνŠœλ‹μ€ ν•˜μ΄νΌνŒŒλΌλ―Έν„°μ— 맀우 λ―Όκ°ν•˜κ³  κ³Όμ ν•©λ˜κΈ° μ‰½μŠ΅λ‹ˆλ‹€. μ μ ˆν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„°λ₯Ό μ„ νƒν•˜λŠ” 데 도움이 λ˜λ„λ‘ λ‹€μ–‘ν•œ ꢌμž₯ 섀정이 ν¬ν•¨λœ [심측 뢄석](https://huggingface.co/blog/dreambooth)을 μ‚΄νŽ΄λ³΄λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
</Tip>
<frameworkcontent>
<pt>
[λͺ‡ μž₯의 κ°•μ•„μ§€ 이미지듀](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ)둜 DreamBoothλ₯Ό μ‹œλ„ν•΄λ΄…μ‹œλ‹€.
이λ₯Ό λ‹€μš΄λ‘œλ“œν•΄ 디렉터리에 μ €μž₯ν•œ λ‹€μŒ `INSTANCE_DIR` ν™˜κ²½ λ³€μˆ˜λ₯Ό ν•΄λ‹Ή 경둜둜 μ„€μ •ν•©λ‹ˆλ‹€:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export OUTPUT_DIR="path_to_saved_model"
```
그런 λ‹€μŒ, λ‹€μŒ λͺ…령을 μ‚¬μš©ν•˜μ—¬ ν•™μŠ΅ 슀크립트λ₯Ό μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€ (전체 ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈλŠ” [μ—¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)μ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€):
```bash
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=400
```
</pt>
<jax>
TPU에 μ•‘μ„ΈμŠ€ν•  수 μžˆκ±°λ‚˜ 더 λΉ λ₯΄κ²Œ ν›ˆλ ¨ν•˜κ³  μ‹Άλ‹€λ©΄ [Flax ν•™μŠ΅ 슀크립트](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)λ₯Ό μ‚¬μš©ν•΄ λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€. Flax ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈλŠ” gradient checkpointing λ˜λŠ” gradient accumulation을 μ§€μ›ν•˜μ§€ μ•ŠμœΌλ―€λ‘œ, λ©”λͺ¨λ¦¬κ°€ 30GB 이상인 GPUκ°€ ν•„μš”ν•©λ‹ˆλ‹€.
슀크립트λ₯Ό μ‹€ν–‰ν•˜κΈ° 전에 μš”κ΅¬ 사항이 μ„€μΉ˜λ˜μ–΄ μžˆλŠ”μ§€ ν™•μΈν•˜μ‹­μ‹œμ˜€.
```bash
pip install -U -r requirements.txt
```
그러면 λ‹€μŒ λͺ…λ Ήμ–΄λ‘œ ν•™μŠ΅ 슀크립트λ₯Ό μ‹€ν–‰μ‹œν‚¬ 수 μžˆμŠ΅λ‹ˆλ‹€:
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--max_train_steps=400
```
</jax>
</frameworkcontent>
### Prior-preserving(사전 보쑴) lossλ₯Ό μ‚¬μš©ν•œ νŒŒμΈνŠœλ‹
과적합과 language driftλ₯Ό λ°©μ§€ν•˜κΈ° μœ„ν•΄ 사전 보쑴이 μ‚¬μš©λ©λ‹ˆλ‹€(관심이 μžˆλŠ” 경우 [λ…Όλ¬Έ](https://arxiv.org/abs/2208.12242)을 μ°Έμ‘°ν•˜μ„Έμš”). 사전 보쑴을 μœ„ν•΄ λ™μΌν•œ 클래슀의 λ‹€λ₯Έ 이미지λ₯Ό ν•™μŠ΅ ν”„λ‘œμ„ΈμŠ€μ˜ μΌλΆ€λ‘œ μ‚¬μš©ν•©λ‹ˆλ‹€. 쒋은 점은 Stable Diffusion λͺ¨λΈ 자체λ₯Ό μ‚¬μš©ν•˜μ—¬ μ΄λŸ¬ν•œ 이미지λ₯Ό 생성할 수 μžˆλ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€! ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈλŠ” μƒμ„±λœ 이미지λ₯Ό μš°λ¦¬κ°€ μ§€μ •ν•œ 둜컬 κ²½λ‘œμ— μ €μž₯ν•©λ‹ˆλ‹€.
μ €μžλ“€μ— λ”°λ₯΄λ©΄ 사전 보쑴을 μœ„ν•΄ `num_epochs * num_samples`개의 이미지λ₯Ό μƒμ„±ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€. 200-300κ°œμ—μ„œ λŒ€λΆ€λΆ„ 잘 μž‘λ™ν•©λ‹ˆλ‹€.
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
## ν…μŠ€νŠΈ 인코더와 and UNet둜 νŒŒμΈνŠœλ‹ν•˜κΈ°
ν•΄λ‹Ή 슀크립트λ₯Ό μ‚¬μš©ν•˜λ©΄ `unet`κ³Ό ν•¨κ»˜ `text_encoder`λ₯Ό νŒŒμΈνŠœλ‹ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ‹€ν—˜μ—μ„œ(μžμ„Έν•œ λ‚΄μš©μ€ [🧨 Diffusersλ₯Ό μ‚¬μš©ν•΄ DreamBooth둜 Stable Diffusion ν•™μŠ΅ν•˜κΈ°](https://huggingface.co/blog/dreambooth) κ²Œμ‹œλ¬Όμ„ ν™•μΈν•˜μ„Έμš”), 특히 μ–Όκ΅΄ 이미지λ₯Ό 생성할 λ•Œ 훨씬 더 λ‚˜μ€ κ²°κ³Όλ₯Ό 얻을 수 μžˆμŠ΅λ‹ˆλ‹€.
<Tip warning={true}>
ν…μŠ€νŠΈ 인코더λ₯Ό ν•™μŠ΅μ‹œν‚€λ €λ©΄ μΆ”κ°€ λ©”λͺ¨λ¦¬κ°€ ν•„μš”ν•΄ 16GB GPUλ‘œλŠ” λ™μž‘ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. 이 μ˜΅μ…˜μ„ μ‚¬μš©ν•˜λ €λ©΄ μ΅œμ†Œ 24GB VRAM이 ν•„μš”ν•©λ‹ˆλ‹€.
</Tip>
`--train_text_encoder` 인수λ₯Ό ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ— μ „λ‹¬ν•˜μ—¬ `text_encoder` 및 `unet`을 νŒŒμΈνŠœλ‹ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--use_8bit_adam
--gradient_checkpointing \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=2e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
## LoRA둜 νŒŒμΈνŠœλ‹ν•˜κΈ°
DreamBoothμ—μ„œ λŒ€κ·œλͺ¨ λͺ¨λΈμ˜ ν•™μŠ΅μ„ κ°€μ†ν™”ν•˜κΈ° μœ„ν•œ νŒŒμΈνŠœλ‹ 기술인 LoRA(Low-Rank Adaptation of Large Language Models)λ₯Ό μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ [LoRA ν•™μŠ΅](training/lora#dreambooth) κ°€μ΄λ“œλ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.
### ν•™μŠ΅ 쀑 체크포인트 μ €μž₯ν•˜κΈ°
Dreambooth둜 ν›ˆλ ¨ν•˜λŠ” λ™μ•ˆ κ³Όμ ν•©ν•˜κΈ° μ‰¬μš°λ―€λ‘œ, λ•Œλ•Œλ‘œ ν•™μŠ΅ 쀑에 정기적인 체크포인트λ₯Ό μ €μž₯ν•˜λŠ” 것이 μœ μš©ν•©λ‹ˆλ‹€. 쀑간 체크포인트 쀑 ν•˜λ‚˜κ°€ μ΅œμ’… λͺ¨λΈλ³΄λ‹€ 더 잘 μž‘λ™ν•  수 μžˆμŠ΅λ‹ˆλ‹€! 체크포인트 μ €μž₯ κΈ°λŠ₯을 ν™œμ„±ν™”ν•˜λ €λ©΄ ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ— λ‹€μŒ 인수λ₯Ό 전달해야 ν•©λ‹ˆλ‹€:
```bash
--checkpointing_steps=500
```
μ΄λ ‡κ²Œ ν•˜λ©΄ `output_dir`의 ν•˜μœ„ 폴더에 전체 ν•™μŠ΅ μƒνƒœκ°€ μ €μž₯λ©λ‹ˆλ‹€. ν•˜μœ„ 폴더 이름은 접두사 `checkpoint-`둜 μ‹œμž‘ν•˜κ³  μ§€κΈˆκΉŒμ§€ μˆ˜ν–‰λœ step μˆ˜μž…λ‹ˆλ‹€. μ˜ˆμ‹œλ‘œ `checkpoint-1500`은 1500 ν•™μŠ΅ step 후에 μ €μž₯된 μ²΄ν¬ν¬μΈνŠΈμž…λ‹ˆλ‹€.
#### μ €μž₯된 μ²΄ν¬ν¬μΈνŠΈμ—μ„œ ν›ˆλ ¨ μž¬κ°œν•˜κΈ°
μ €μž₯된 μ²΄ν¬ν¬μΈνŠΈμ—μ„œ ν›ˆλ ¨μ„ μž¬κ°œν•˜λ €λ©΄, `--resume_from_checkpoint` 인수λ₯Ό μ „λ‹¬ν•œ λ‹€μŒ μ‚¬μš©ν•  체크포인트의 이름을 μ§€μ •ν•˜λ©΄ λ©λ‹ˆλ‹€. 특수 λ¬Έμžμ—΄ `"latest"`λ₯Ό μ‚¬μš©ν•˜μ—¬ μ €μž₯된 λ§ˆμ§€λ§‰ 체크포인트(즉, step μˆ˜κ°€ κ°€μž₯ λ§Žμ€ 체크포인트)μ—μ„œ μž¬κ°œν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄ λ‹€μŒμ€ 1500 step 후에 μ €μž₯된 μ²΄ν¬ν¬μΈνŠΈμ—μ„œλΆ€ν„° ν•™μŠ΅μ„ μž¬κ°œν•©λ‹ˆλ‹€:
```bash
--resume_from_checkpoint="checkpoint-1500"
```
μ›ν•˜λŠ” 경우 일뢀 ν•˜μ΄νΌνŒŒλΌλ―Έν„°λ₯Ό μ‘°μ •ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
#### μ €μž₯된 체크포인트λ₯Ό μ‚¬μš©ν•˜μ—¬ μΆ”λ‘  μˆ˜ν–‰ν•˜κΈ°
μ €μž₯된 μ²΄ν¬ν¬μΈνŠΈλŠ” ν›ˆλ ¨ μž¬κ°œμ— μ ν•©ν•œ ν˜•μ‹μœΌλ‘œ μ €μž₯λ©λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” λͺ¨λΈ κ°€μ€‘μΉ˜λΏλ§Œ μ•„λ‹ˆλΌ μ˜΅ν‹°λ§ˆμ΄μ €, 데이터 λ‘œλ” 및 ν•™μŠ΅λ₯ μ˜ μƒνƒœλ„ ν¬ν•¨λ©λ‹ˆλ‹€.
**`"accelerate>=0.16.0"`**이 μ„€μΉ˜λœ 경우 λ‹€μŒ μ½”λ“œλ₯Ό μ‚¬μš©ν•˜μ—¬ 쀑간 μ²΄ν¬ν¬μΈνŠΈμ—μ„œ 좔둠을 μ‹€ν–‰ν•©λ‹ˆλ‹€.
```python
from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch
# ν•™μŠ΅μ— μ‚¬μš©λœ 것과 λ™μΌν•œ 인수(model, revision)둜 νŒŒμ΄ν”„λΌμΈμ„ λΆˆλŸ¬μ˜΅λ‹ˆλ‹€.
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")
# `args.train_text_encoder`둜 ν•™μŠ΅ν•œ 경우면 ν…μŠ€νŠΈ 인코더λ₯Ό κΌ­ λΆˆλŸ¬μ˜€μ„Έμš”
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder")
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16)
pipeline.to("cuda")
# 좔둠을 μˆ˜ν–‰ν•˜κ±°λ‚˜ μ €μž₯ν•˜κ±°λ‚˜, ν—ˆλΈŒμ— ν‘Έμ‹œν•©λ‹ˆλ‹€.
pipeline.save_pretrained("dreambooth-pipeline")
```
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first:
```python
from accelerate import Accelerator
from diffusers import DiffusionPipeline
# ν•™μŠ΅μ— μ‚¬μš©λœ 것과 λ™μΌν•œ 인수(model, revision)둜 νŒŒμ΄ν”„λΌμΈμ„ λΆˆλŸ¬μ˜΅λ‹ˆλ‹€.
model_id = "CompVis/stable-diffusion-v1-4"
pipeline = DiffusionPipeline.from_pretrained(model_id)
accelerator = Accelerator()
# 초기 ν•™μŠ΅μ— `--train_text_encoder`κ°€ μ‚¬μš©λœ 경우 text_encoderλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder)
# 체크포인트 κ²½λ‘œλ‘œλΆ€ν„° μƒνƒœλ₯Ό λ³΅μ›ν•©λ‹ˆλ‹€. μ—¬κΈ°μ„œλŠ” μ ˆλŒ€ 경둜λ₯Ό μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€.
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100")
# unwrapped λͺ¨λΈλ‘œ νŒŒμ΄ν”„λΌμΈμ„ λ‹€μ‹œ λΉŒλ“œν•©λ‹ˆλ‹€.(.unet and .text_encoder둜의 할당도 μž‘λ™ν•΄μ•Ό ν•©λ‹ˆλ‹€)
pipeline = DiffusionPipeline.from_pretrained(
model_id,
unet=accelerator.unwrap_model(unet),
text_encoder=accelerator.unwrap_model(text_encoder),
)
# 좔둠을 μˆ˜ν–‰ν•˜κ±°λ‚˜ μ €μž₯ν•˜κ±°λ‚˜, ν—ˆλΈŒμ— ν‘Έμ‹œν•©λ‹ˆλ‹€.
pipeline.save_pretrained("dreambooth-pipeline")
```
## 각 GPU μš©λŸ‰μ—μ„œμ˜ μ΅œμ ν™”
ν•˜λ“œμ›¨μ–΄μ— 따라 16GBμ—μ„œ 8GBκΉŒμ§€ GPUμ—μ„œ DreamBoothλ₯Ό μ΅œμ ν™”ν•˜λŠ” λͺ‡ κ°€μ§€ 방법이 μžˆμŠ΅λ‹ˆλ‹€!
### xFormers
[xFormers](https://github.com/facebookresearch/xformers)λŠ” Transformersλ₯Ό μ΅œμ ν™”ν•˜κΈ° μœ„ν•œ toolbox이며, 🧨 Diffusersμ—μ„œ μ‚¬μš©λ˜λŠ”[memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) λ©”μ»€λ‹ˆμ¦˜μ„ ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. [xFormersλ₯Ό μ„€μΉ˜](./optimization/xformers)ν•œ λ‹€μŒ ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ— λ‹€μŒ 인수λ₯Ό μΆ”κ°€ν•©λ‹ˆλ‹€:
```bash
--enable_xformers_memory_efficient_attention
```
xFormersλŠ” Flaxμ—μ„œ μ‚¬μš©ν•  수 μ—†μŠ΅λ‹ˆλ‹€.
### κ·Έλž˜λ””μ–ΈνŠΈ μ—†μŒμœΌλ‘œ μ„€μ •
λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ„ 쀄일 수 μžˆλŠ” 또 λ‹€λ₯Έ 방법은 [기울기 μ„€μ •](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)을 0 λŒ€μ‹  `None`으둜 ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 이둜 인해 νŠΉμ • λ™μž‘μ΄ 변경될 수 μžˆμœΌλ―€λ‘œ λ¬Έμ œκ°€ λ°œμƒν•˜λ©΄ 이 인수λ₯Ό μ œκ±°ν•΄ λ³΄μ‹­μ‹œμ˜€. ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ— λ‹€μŒ 인수λ₯Ό μΆ”κ°€ν•˜μ—¬ κ·Έλž˜λ””μ–ΈνŠΈλ₯Ό `None`으둜 μ„€μ •ν•©λ‹ˆλ‹€.
```bash
--set_grads_to_none
```
### 16GB GPU
Gradient checkpointingκ³Ό [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)의 8λΉ„νŠΈ μ˜΅ν‹°λ§ˆμ΄μ €μ˜ λ„μ›€μœΌλ‘œ, 16GB GPUμ—μ„œ dreamboothλ₯Ό ν›ˆλ ¨ν•  수 μžˆμŠ΅λ‹ˆλ‹€. bitsandbytesκ°€ μ„€μΉ˜λ˜μ–΄ μžˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”:
```bash
pip install bitsandbytes
```
κ·Έ λ‹€μŒ, ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ— `--use_8bit_adam` μ˜΅μ…˜μ„ λͺ…μ‹œν•©λ‹ˆλ‹€:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=2 --gradient_checkpointing \
--use_8bit_adam \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
### 12GB GPU
12GB GPUμ—μ„œ DreamBoothλ₯Ό μ‹€ν–‰ν•˜λ €λ©΄ gradient checkpointing, 8λΉ„νŠΈ μ˜΅ν‹°λ§ˆμ΄μ €, xFormersλ₯Ό ν™œμ„±ν™”ν•˜κ³  κ·Έλž˜λ””μ–ΈνŠΈλ₯Ό `None`으둜 μ„€μ •ν•΄μ•Ό ν•©λ‹ˆλ‹€.
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
### 8GB GPUμ—μ„œ ν•™μŠ΅ν•˜κΈ°
8GB GPU에 λŒ€ν•΄μ„œλŠ” [DeepSpeed](https://www.deepspeed.ai/)λ₯Ό μ‚¬μš©ν•΄ 일뢀 ν…μ„œλ₯Ό VRAMμ—μ„œ CPU λ˜λŠ” NVME둜 μ˜€ν”„λ‘œλ“œν•˜μ—¬ 더 적은 GPU λ©”λͺ¨λ¦¬λ‘œ ν•™μŠ΅ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
πŸ€— Accelerate ν™˜κ²½μ„ κ΅¬μ„±ν•˜λ €λ©΄ λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•˜μ„Έμš”:
```bash
accelerate config
```
ν™˜κ²½ ꡬ성 쀑에 DeepSpeedλ₯Ό μ‚¬μš©ν•  것을 ν™•μΈν•˜μ„Έμš”.
그러면 DeepSpeed stage 2, fp16 ν˜Όν•© 정밀도λ₯Ό κ²°ν•©ν•˜κ³  λͺ¨λΈ λ§€κ°œλ³€μˆ˜μ™€ μ˜΅ν‹°λ§ˆμ΄μ € μƒνƒœλ₯Ό λͺ¨λ‘ CPU둜 μ˜€ν”„λ‘œλ“œν•˜λ©΄ 8GB VRAM λ―Έλ§Œμ—μ„œ ν•™μŠ΅ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
단점은 더 λ§Žμ€ μ‹œμŠ€ν…œ RAM(μ•½ 25GB)이 ν•„μš”ν•˜λ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€. μΆ”κ°€ ꡬ성 μ˜΅μ…˜μ€ [DeepSpeed λ¬Έμ„œ](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.
λ˜ν•œ κΈ°λ³Έ Adam μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό DeepSpeed의 μ΅œμ ν™”λœ Adam λ²„μ „μœΌλ‘œ λ³€κ²½ν•΄μ•Ό ν•©λ‹ˆλ‹€.
μ΄λŠ” μƒλ‹Ήν•œ 속도 ν–₯상을 μœ„ν•œ Adam인 [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu)μž…λ‹ˆλ‹€.
`DeepSpeedCPUAdam`을 ν™œμ„±ν™”ν•˜λ €λ©΄ μ‹œμŠ€ν…œμ˜ CUDA toolchain 버전이 PyTorch와 ν•¨κ»˜ μ„€μΉ˜λœ 것과 동일해야 ν•©λ‹ˆλ‹€.
8λΉ„νŠΈ μ˜΅ν‹°λ§ˆμ΄μ €λŠ” ν˜„μž¬ DeepSpeed와 ν˜Έν™˜λ˜μ§€ μ•ŠλŠ” 것 κ°™μŠ΅λ‹ˆλ‹€.
λ‹€μŒ λͺ…λ ΉμœΌλ‘œ ν•™μŠ΅μ„ μ‹œμž‘ν•©λ‹ˆλ‹€:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--sample_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--mixed_precision=fp16
```
## μΆ”λ‘ 
λͺ¨λΈμ„ ν•™μŠ΅ν•œ ν›„μ—λŠ”, λͺ¨λΈμ΄ μ €μž₯된 경둜λ₯Ό μ§€μ •ν•΄ [`StableDiffusionPipeline`]둜 좔둠을 μˆ˜ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€. ν”„λ‘¬ν”„νŠΈμ— ν•™μŠ΅μ— μ‚¬μš©λœ 특수 `μ‹λ³„μž`(이전 μ˜ˆμ‹œμ˜ `sks`)κ°€ ν¬ν•¨λ˜μ–΄ μžˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”.
**`"accelerate>=0.16.0"`**이 μ„€μΉ˜λ˜μ–΄ μžˆλŠ” 경우 λ‹€μŒ μ½”λ“œλ₯Ό μ‚¬μš©ν•˜μ—¬ 쀑간 μ²΄ν¬ν¬μΈνŠΈμ—μ„œ 좔둠을 μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
```python
from diffusers import StableDiffusionPipeline
import torch
model_id = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("dog-bucket.png")
```
[μ €μž₯된 ν•™μŠ΅ 체크포인트](#inference-from-a-saved-checkpoint)μ—μ„œλ„ 좔둠을 μ‹€ν–‰ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.