|
<!--Copyright 2024 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# DreamBooth |
|
|
|
[DreamBooth](https://arxiv.org/abs/2208.12242)λ ν μ£Όμ μ λν μ μ μ΄λ―Έμ§(3~5κ°)λ§μΌλ‘λ stable diffusionκ³Ό κ°μ΄ text-to-image λͺ¨λΈμ κ°μΈνν μ μλ λ°©λ²μ
λλ€. μ΄λ₯Ό ν΅ν΄ λͺ¨λΈμ λ€μν μ₯λ©΄, ν¬μ¦ λ° μ₯λ©΄(λ·°)μμ νΌμ¬μ²΄μ λν΄ λ§₯λ½ν(contextualized)λ μ΄λ―Έμ§λ₯Ό μμ±ν μ μμ΅λλ€. |
|
|
|
 |
|
<small>μμμ Dreambooth μμ <a href="https://dreambooth.github.io">project's blog.</a></small> |
|
|
|
|
|
μ΄ κ°μ΄λλ λ€μν GPU, Flax μ¬μμ λν΄ [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) λͺ¨λΈλ‘ DreamBoothλ₯Ό νμΈνλνλ λ°©λ²μ 보μ¬μ€λλ€. λ κΉμ΄ νκ³ λ€μ΄ μλ λ°©μμ νμΈνλ λ° κ΄μ¬μ΄ μλ κ²½μ°, μ΄ κ°μ΄λμ μ¬μ©λ DreamBoothμ λͺ¨λ νμ΅ μ€ν¬λ¦½νΈλ₯Ό [μ¬κΈ°](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)μμ μ°Ύμ μ μμ΅λλ€. |
|
|
|
μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ λΌμ΄λΈλ¬λ¦¬μ νμ΅μ νμν dependenciesλ₯Ό μ€μΉν΄μΌ ν©λλ€. λν `main` GitHub λΈλμΉμμ 𧨠Diffusersλ₯Ό μ€μΉνλ κ²μ΄ μ’μ΅λλ€. |
|
|
|
```bash |
|
pip install git+https://github.com/huggingface/diffusers |
|
pip install -U -r diffusers/examples/dreambooth/requirements.txt |
|
``` |
|
|
|
xFormersλ νμ΅μ νμν μꡬ μ¬νμ μλμ§λ§, κ°λ₯νλ©΄ [μ€μΉ](../optimization/xformers)νλ κ²μ΄ μ’μ΅λλ€. νμ΅ μλλ₯Ό λμ΄κ³ λ©λͺ¨λ¦¬ μ¬μ©λμ μ€μΌ μ μκΈ° λλ¬Έμ
λλ€. |
|
|
|
λͺ¨λ dependenciesμ μ€μ ν ν λ€μμ μ¬μ©νμ¬ [π€ Accelerate](https://github.com/huggingface/accelerate/) νκ²½μ λ€μκ³Ό κ°μ΄ μ΄κΈ°νν©λλ€: |
|
|
|
```bash |
|
accelerate config |
|
``` |
|
|
|
λ³λ μ€μ μμ΄ κΈ°λ³Έ π€ Accelerate νκ²½μ μ€μΉνλ €λ©΄ λ€μμ μ€νν©λλ€: |
|
|
|
```bash |
|
accelerate config default |
|
``` |
|
|
|
λλ νμ¬ νκ²½μ΄ λ
ΈνΈλΆκ³Ό κ°μ λνν μ
Έμ μ§μνμ§ μλ κ²½μ° λ€μμ μ¬μ©ν μ μμ΅λλ€: |
|
|
|
```py |
|
from accelerate.utils import write_basic_config |
|
|
|
write_basic_config() |
|
``` |
|
|
|
## νμΈνλ |
|
|
|
<Tip warning={true}> |
|
|
|
DreamBooth νμΈνλμ νμ΄νΌνλΌλ―Έν°μ λ§€μ° λ―Όκ°νκ³ κ³Όμ ν©λκΈ° μ½μ΅λλ€. μ μ ν νμ΄νΌνλΌλ―Έν°λ₯Ό μ ννλ λ° λμμ΄ λλλ‘ λ€μν κΆμ₯ μ€μ μ΄ ν¬ν¨λ [μ¬μΈ΅ λΆμ](https://huggingface.co/blog/dreambooth)μ μ΄ν΄λ³΄λ κ²μ΄ μ’μ΅λλ€. |
|
|
|
</Tip> |
|
|
|
<frameworkcontent> |
|
<pt> |
|
[λͺ μ₯μ κ°μμ§ μ΄λ―Έμ§λ€](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ)λ‘ DreamBoothλ₯Ό μλν΄λ΄
μλ€. |
|
μ΄λ₯Ό λ€μ΄λ‘λν΄ λλ ν°λ¦¬μ μ μ₯ν λ€μ `INSTANCE_DIR` νκ²½ λ³μλ₯Ό ν΄λΉ κ²½λ‘λ‘ μ€μ ν©λλ€: |
|
|
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
``` |
|
|
|
κ·Έλ° λ€μ, λ€μ λͺ
λ Ήμ μ¬μ©νμ¬ νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ€νν μ μμ΅λλ€ (μ 체 νμ΅ μ€ν¬λ¦½νΈλ [μ¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)μμ μ°Ύμ μ μμ΅λλ€): |
|
|
|
```bash |
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--max_train_steps=400 |
|
``` |
|
</pt> |
|
<jax> |
|
|
|
TPUμ μ‘μΈμ€ν μ μκ±°λ λ λΉ λ₯΄κ² νλ ¨νκ³ μΆλ€λ©΄ [Flax νμ΅ μ€ν¬λ¦½νΈ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)λ₯Ό μ¬μ©ν΄ λ³Ό μ μμ΅λλ€. Flax νμ΅ μ€ν¬λ¦½νΈλ gradient checkpointing λλ gradient accumulationμ μ§μνμ§ μμΌλ―λ‘, λ©λͺ¨λ¦¬κ° 30GB μ΄μμΈ GPUκ° νμν©λλ€. |
|
|
|
μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ μꡬ μ¬νμ΄ μ€μΉλμ΄ μλμ§ νμΈνμμμ€. |
|
|
|
```bash |
|
pip install -U -r requirements.txt |
|
``` |
|
|
|
κ·Έλ¬λ©΄ λ€μ λͺ
λ Ήμ΄λ‘ νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ€νμν¬ μ μμ΅λλ€: |
|
|
|
```bash |
|
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
python train_dreambooth_flax.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=5e-6 \ |
|
--max_train_steps=400 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
### Prior-preserving(μ¬μ 보쑴) lossλ₯Ό μ¬μ©ν νμΈνλ |
|
|
|
κ³Όμ ν©κ³Ό language driftλ₯Ό λ°©μ§νκΈ° μν΄ μ¬μ λ³΄μ‘΄μ΄ μ¬μ©λ©λλ€(κ΄μ¬μ΄ μλ κ²½μ° [λ
Όλ¬Έ](https://arxiv.org/abs/2208.12242)μ μ°Έμ‘°νμΈμ). μ¬μ 보쑴μ μν΄ λμΌν ν΄λμ€μ λ€λ₯Έ μ΄λ―Έμ§λ₯Ό νμ΅ νλ‘μΈμ€μ μΌλΆλ‘ μ¬μ©ν©λλ€. μ’μ μ μ Stable Diffusion λͺ¨λΈ μ체λ₯Ό μ¬μ©νμ¬ μ΄λ¬ν μ΄λ―Έμ§λ₯Ό μμ±ν μ μλ€λ κ²μ
λλ€! νμ΅ μ€ν¬λ¦½νΈλ μμ±λ μ΄λ―Έμ§λ₯Ό μ°λ¦¬κ° μ§μ ν λ‘컬 κ²½λ‘μ μ μ₯ν©λλ€. |
|
|
|
μ μλ€μ λ°λ₯΄λ©΄ μ¬μ 보쑴μ μν΄ `num_epochs * num_samples`κ°μ μ΄λ―Έμ§λ₯Ό μμ±νλ κ²μ΄ μ’μ΅λλ€. 200-300κ°μμ λλΆλΆ μ μλν©λλ€. |
|
|
|
<frameworkcontent> |
|
<pt> |
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export CLASS_DIR="path_to_class_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</pt> |
|
<jax> |
|
```bash |
|
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
python train_dreambooth_flax.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=5e-6 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
## ν
μ€νΈ μΈμ½λμ and UNetλ‘ νμΈνλνκΈ° |
|
|
|
ν΄λΉ μ€ν¬λ¦½νΈλ₯Ό μ¬μ©νλ©΄ `unet`κ³Ό ν¨κ» `text_encoder`λ₯Ό νμΈνλν μ μμ΅λλ€. μ€νμμ(μμΈν λ΄μ©μ [𧨠Diffusersλ₯Ό μ¬μ©ν΄ DreamBoothλ‘ Stable Diffusion νμ΅νκΈ°](https://huggingface.co/blog/dreambooth) κ²μλ¬Όμ νμΈνμΈμ), νΉν μΌκ΅΄ μ΄λ―Έμ§λ₯Ό μμ±ν λ ν¨μ¬ λ λμ κ²°κ³Όλ₯Ό μ»μ μ μμ΅λλ€. |
|
|
|
<Tip warning={true}> |
|
|
|
ν
μ€νΈ μΈμ½λλ₯Ό νμ΅μν€λ €λ©΄ μΆκ° λ©λͺ¨λ¦¬κ° νμν΄ 16GB GPUλ‘λ λμνμ§ μμ΅λλ€. μ΄ μ΅μ
μ μ¬μ©νλ €λ©΄ μ΅μ 24GB VRAMμ΄ νμν©λλ€. |
|
|
|
</Tip> |
|
|
|
`--train_text_encoder` μΈμλ₯Ό νμ΅ μ€ν¬λ¦½νΈμ μ λ¬νμ¬ `text_encoder` λ° `unet`μ νμΈνλν μ μμ΅λλ€: |
|
|
|
<frameworkcontent> |
|
<pt> |
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export CLASS_DIR="path_to_class_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--train_text_encoder \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--use_8bit_adam |
|
--gradient_checkpointing \ |
|
--learning_rate=2e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</pt> |
|
<jax> |
|
```bash |
|
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
python train_dreambooth_flax.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--train_text_encoder \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=2e-6 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
## LoRAλ‘ νμΈνλνκΈ° |
|
|
|
DreamBoothμμ λκ·λͺ¨ λͺ¨λΈμ νμ΅μ κ°μννκΈ° μν νμΈνλ κΈ°μ μΈ LoRA(Low-Rank Adaptation of Large Language Models)λ₯Ό μ¬μ©ν μ μμ΅λλ€. μμΈν λ΄μ©μ [LoRA νμ΅](training/lora#dreambooth) κ°μ΄λλ₯Ό μ°Έμ‘°νμΈμ. |
|
|
|
### νμ΅ μ€ μ²΄ν¬ν¬μΈνΈ μ μ₯νκΈ° |
|
|
|
Dreamboothλ‘ νλ ¨νλ λμ κ³Όμ ν©νκΈ° μ¬μ°λ―λ‘, λλλ‘ νμ΅ μ€μ μ κΈ°μ μΈ μ²΄ν¬ν¬μΈνΈλ₯Ό μ μ₯νλ κ²μ΄ μ μ©ν©λλ€. μ€κ° 체ν¬ν¬μΈνΈ μ€ νλκ° μ΅μ’
λͺ¨λΈλ³΄λ€ λ μ μλν μ μμ΅λλ€! 체ν¬ν¬μΈνΈ μ μ₯ κΈ°λ₯μ νμ±ννλ €λ©΄ νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μ λ¬ν΄μΌ ν©λλ€: |
|
|
|
```bash |
|
--checkpointing_steps=500 |
|
``` |
|
|
|
μ΄λ κ² νλ©΄ `output_dir`μ νμ ν΄λμ μ 체 νμ΅ μνκ° μ μ₯λ©λλ€. νμ ν΄λ μ΄λ¦μ μ λμ¬ `checkpoint-`λ‘ μμνκ³ μ§κΈκΉμ§ μνλ step μμ
λλ€. μμλ‘ `checkpoint-1500`μ 1500 νμ΅ step νμ μ μ₯λ 체ν¬ν¬μΈνΈμ
λλ€. |
|
|
|
#### μ μ₯λ 체ν¬ν¬μΈνΈμμ νλ ¨ μ¬κ°νκΈ° |
|
|
|
μ μ₯λ 체ν¬ν¬μΈνΈμμ νλ ¨μ μ¬κ°νλ €λ©΄, `--resume_from_checkpoint` μΈμλ₯Ό μ λ¬ν λ€μ μ¬μ©ν 체ν¬ν¬μΈνΈμ μ΄λ¦μ μ§μ νλ©΄ λ©λλ€. νΉμ λ¬Έμμ΄ `"latest"`λ₯Ό μ¬μ©νμ¬ μ μ₯λ λ§μ§λ§ 체ν¬ν¬μΈνΈ(μ¦, step μκ° κ°μ₯ λ§μ 체ν¬ν¬μΈνΈ)μμ μ¬κ°ν μλ μμ΅λλ€. μλ₯Ό λ€μ΄ λ€μμ 1500 step νμ μ μ₯λ 체ν¬ν¬μΈνΈμμλΆν° νμ΅μ μ¬κ°ν©λλ€: |
|
|
|
```bash |
|
--resume_from_checkpoint="checkpoint-1500" |
|
``` |
|
|
|
μνλ κ²½μ° μΌλΆ νμ΄νΌνλΌλ―Έν°λ₯Ό μ‘°μ ν μ μμ΅λλ€. |
|
|
|
#### μ μ₯λ 체ν¬ν¬μΈνΈλ₯Ό μ¬μ©νμ¬ μΆλ‘ μννκΈ° |
|
|
|
μ μ₯λ 체ν¬ν¬μΈνΈλ νλ ¨ μ¬κ°μ μ ν©ν νμμΌλ‘ μ μ₯λ©λλ€. μ¬κΈ°μλ λͺ¨λΈ κ°μ€μΉλΏλ§ μλλΌ μ΅ν°λ§μ΄μ , λ°μ΄ν° λ‘λ λ° νμ΅λ₯ μ μνλ ν¬ν¨λ©λλ€. |
|
|
|
**`"accelerate>=0.16.0"`**μ΄ μ€μΉλ κ²½μ° λ€μ μ½λλ₯Ό μ¬μ©νμ¬ μ€κ° 체ν¬ν¬μΈνΈμμ μΆλ‘ μ μ€νν©λλ€. |
|
|
|
```python |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel |
|
from transformers import CLIPTextModel |
|
import torch |
|
|
|
# νμ΅μ μ¬μ©λ κ²κ³Ό λμΌν μΈμ(model, revision)λ‘ νμ΄νλΌμΈμ λΆλ¬μ΅λλ€. |
|
model_id = "CompVis/stable-diffusion-v1-4" |
|
|
|
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet") |
|
|
|
# `args.train_text_encoder`λ‘ νμ΅ν κ²½μ°λ©΄ ν
μ€νΈ μΈμ½λλ₯Ό κΌ λΆλ¬μ€μΈμ |
|
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder") |
|
|
|
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16) |
|
pipeline.to("cuda") |
|
|
|
# μΆλ‘ μ μννκ±°λ μ μ₯νκ±°λ, νλΈμ νΈμν©λλ€. |
|
pipeline.save_pretrained("dreambooth-pipeline") |
|
``` |
|
|
|
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: |
|
|
|
```python |
|
from accelerate import Accelerator |
|
from diffusers import DiffusionPipeline |
|
|
|
# νμ΅μ μ¬μ©λ κ²κ³Ό λμΌν μΈμ(model, revision)λ‘ νμ΄νλΌμΈμ λΆλ¬μ΅λλ€. |
|
model_id = "CompVis/stable-diffusion-v1-4" |
|
pipeline = DiffusionPipeline.from_pretrained(model_id) |
|
|
|
accelerator = Accelerator() |
|
|
|
# μ΄κΈ° νμ΅μ `--train_text_encoder`κ° μ¬μ©λ κ²½μ° text_encoderλ₯Ό μ¬μ©ν©λλ€. |
|
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder) |
|
|
|
# 체ν¬ν¬μΈνΈ κ²½λ‘λ‘λΆν° μνλ₯Ό 볡μν©λλ€. μ¬κΈ°μλ μ λ κ²½λ‘λ₯Ό μ¬μ©ν΄μΌ ν©λλ€. |
|
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100") |
|
|
|
# unwrapped λͺ¨λΈλ‘ νμ΄νλΌμΈμ λ€μ λΉλν©λλ€.(.unet and .text_encoderλ‘μ ν λΉλ μλν΄μΌ ν©λλ€) |
|
pipeline = DiffusionPipeline.from_pretrained( |
|
model_id, |
|
unet=accelerator.unwrap_model(unet), |
|
text_encoder=accelerator.unwrap_model(text_encoder), |
|
) |
|
|
|
# μΆλ‘ μ μννκ±°λ μ μ₯νκ±°λ, νλΈμ νΈμν©λλ€. |
|
pipeline.save_pretrained("dreambooth-pipeline") |
|
``` |
|
|
|
## κ° GPU μ©λμμμ μ΅μ ν |
|
|
|
νλμ¨μ΄μ λ°λΌ 16GBμμ 8GBκΉμ§ GPUμμ DreamBoothλ₯Ό μ΅μ ννλ λͺ κ°μ§ λ°©λ²μ΄ μμ΅λλ€! |
|
|
|
### xFormers |
|
|
|
[xFormers](https://github.com/facebookresearch/xformers)λ Transformersλ₯Ό μ΅μ ννκΈ° μν toolboxμ΄λ©°, 𧨠Diffusersμμ μ¬μ©λλ[memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) λ©μ»€λμ¦μ ν¬ν¨νκ³ μμ΅λλ€. [xFormersλ₯Ό μ€μΉ](./optimization/xformers)ν λ€μ νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μΆκ°ν©λλ€: |
|
|
|
```bash |
|
--enable_xformers_memory_efficient_attention |
|
``` |
|
|
|
xFormersλ Flaxμμ μ¬μ©ν μ μμ΅λλ€. |
|
|
|
### κ·ΈλλμΈνΈ μμμΌλ‘ μ€μ |
|
|
|
λ©λͺ¨λ¦¬ μ¬μ©λμ μ€μΌ μ μλ λ λ€λ₯Έ λ°©λ²μ [κΈ°μΈκΈ° μ€μ ](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)μ 0 λμ `None`μΌλ‘ νλ κ²μ
λλ€. κ·Έλ¬λ μ΄λ‘ μΈν΄ νΉμ λμμ΄ λ³κ²½λ μ μμΌλ―λ‘ λ¬Έμ κ° λ°μνλ©΄ μ΄ μΈμλ₯Ό μ κ±°ν΄ λ³΄μμμ€. νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μΆκ°νμ¬ κ·ΈλλμΈνΈλ₯Ό `None`μΌλ‘ μ€μ ν©λλ€. |
|
|
|
```bash |
|
--set_grads_to_none |
|
``` |
|
|
|
### 16GB GPU |
|
|
|
Gradient checkpointingκ³Ό [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)μ 8λΉνΈ μ΅ν°λ§μ΄μ μ λμμΌλ‘, 16GB GPUμμ dreamboothλ₯Ό νλ ¨ν μ μμ΅λλ€. bitsandbytesκ° μ€μΉλμ΄ μλμ§ νμΈνμΈμ: |
|
|
|
```bash |
|
pip install bitsandbytes |
|
``` |
|
|
|
κ·Έ λ€μ, νμ΅ μ€ν¬λ¦½νΈμ `--use_8bit_adam` μ΅μ
μ λͺ
μν©λλ€: |
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export CLASS_DIR="path_to_class_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=2 --gradient_checkpointing \ |
|
--use_8bit_adam \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
### 12GB GPU |
|
|
|
12GB GPUμμ DreamBoothλ₯Ό μ€ννλ €λ©΄ gradient checkpointing, 8λΉνΈ μ΅ν°λ§μ΄μ , xFormersλ₯Ό νμ±ννκ³ κ·ΈλλμΈνΈλ₯Ό `None`μΌλ‘ μ€μ ν΄μΌ ν©λλ€. |
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 --gradient_checkpointing \ |
|
--use_8bit_adam \ |
|
--enable_xformers_memory_efficient_attention \ |
|
--set_grads_to_none \ |
|
--learning_rate=2e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
### 8GB GPUμμ νμ΅νκΈ° |
|
|
|
8GB GPUμ λν΄μλ [DeepSpeed](https://www.deepspeed.ai/)λ₯Ό μ¬μ©ν΄ μΌλΆ ν
μλ₯Ό VRAMμμ CPU λλ NVMEλ‘ μ€νλ‘λνμ¬ λ μ μ GPU λ©λͺ¨λ¦¬λ‘ νμ΅ν μλ μμ΅λλ€. |
|
|
|
π€ Accelerate νκ²½μ ꡬμ±νλ €λ©΄ λ€μ λͺ
λ Ήμ μ€ννμΈμ: |
|
|
|
```bash |
|
accelerate config |
|
``` |
|
|
|
νκ²½ κ΅¬μ± μ€μ DeepSpeedλ₯Ό μ¬μ©ν κ²μ νμΈνμΈμ. |
|
κ·Έλ¬λ©΄ DeepSpeed stage 2, fp16 νΌν© μ λ°λλ₯Ό κ²°ν©νκ³ λͺ¨λΈ λ§€κ°λ³μμ μ΅ν°λ§μ΄μ μνλ₯Ό λͺ¨λ CPUλ‘ μ€νλ‘λνλ©΄ 8GB VRAM λ―Έλ§μμ νμ΅ν μ μμ΅λλ€. |
|
λ¨μ μ λ λ§μ μμ€ν
RAM(μ½ 25GB)μ΄ νμνλ€λ κ²μ
λλ€. μΆκ° κ΅¬μ± μ΅μ
μ [DeepSpeed λ¬Έμ](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)λ₯Ό μ°Έμ‘°νμΈμ. |
|
|
|
λν κΈ°λ³Έ Adam μ΅ν°λ§μ΄μ λ₯Ό DeepSpeedμ μ΅μ νλ Adam λ²μ μΌλ‘ λ³κ²½ν΄μΌ ν©λλ€. |
|
μ΄λ μλΉν μλ ν₯μμ μν AdamμΈ [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu)μ
λλ€. |
|
`DeepSpeedCPUAdam`μ νμ±ννλ €λ©΄ μμ€ν
μ CUDA toolchain λ²μ μ΄ PyTorchμ ν¨κ» μ€μΉλ κ²κ³Ό λμΌν΄μΌ ν©λλ€. |
|
|
|
8λΉνΈ μ΅ν°λ§μ΄μ λ νμ¬ DeepSpeedμ νΈνλμ§ μλ κ² κ°μ΅λλ€. |
|
|
|
λ€μ λͺ
λ ΉμΌλ‘ νμ΅μ μμν©λλ€: |
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export CLASS_DIR="path_to_class_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--sample_batch_size=1 \ |
|
--gradient_accumulation_steps=1 --gradient_checkpointing \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 \ |
|
--mixed_precision=fp16 |
|
``` |
|
|
|
## μΆλ‘ |
|
|
|
λͺ¨λΈμ νμ΅ν νμλ, λͺ¨λΈμ΄ μ μ₯λ κ²½λ‘λ₯Ό μ§μ ν΄ [`StableDiffusionPipeline`]λ‘ μΆλ‘ μ μνν μ μμ΅λλ€. ν둬ννΈμ νμ΅μ μ¬μ©λ νΉμ `μλ³μ`(μ΄μ μμμ `sks`)κ° ν¬ν¨λμ΄ μλμ§ νμΈνμΈμ. |
|
|
|
**`"accelerate>=0.16.0"`**μ΄ μ€μΉλμ΄ μλ κ²½μ° λ€μ μ½λλ₯Ό μ¬μ©νμ¬ μ€κ° 체ν¬ν¬μΈνΈμμ μΆλ‘ μ μ€νν μ μμ΅λλ€: |
|
|
|
```python |
|
from diffusers import StableDiffusionPipeline |
|
import torch |
|
|
|
model_id = "path_to_saved_model" |
|
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "A photo of sks dog in a bucket" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] |
|
|
|
image.save("dog-bucket.png") |
|
``` |
|
|
|
[μ μ₯λ νμ΅ μ²΄ν¬ν¬μΈνΈ](#inference-from-a-saved-checkpoint)μμλ μΆλ‘ μ μ€νν μλ μμ΅λλ€. |
|
|