|
< |
|
|
|
you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path: |
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path_to_training_images" |
|
export OUTPUT_DIR="path_to_saved_model" |
|
``` |
|
|
|
Then you can launch the training script (you can find the full training script [here](https: |
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--max_train_steps=400 |
|
``` |
|
</pt> |
|
<jax> |
|
If you have access to TPUs or want to train even faster, you can try out the [Flax training script](https: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=5e-6 \ |
|
--max_train_steps=400 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
## Finetuning with prior-preserving loss |
|
|
|
Prior preservation is used to avoid overfitting and language-drift (check out the [paper](https: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</pt> |
|
<jax> |
|
```bash |
|
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
python train_dreambooth_flax.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=5e-6 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
## Finetuning the text encoder and UNet |
|
|
|
The script also allows you to finetune the `text_encoder` along with the `unet`. In our experiments (check out the [Training Stable Diffusion with DreamBooth using 🧨 Diffusers](https: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--train_text_encoder \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--use_8bit_adam |
|
--gradient_checkpointing \ |
|
--learning_rate=2e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</pt> |
|
<jax> |
|
```bash |
|
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
python train_dreambooth_flax.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--train_text_encoder \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--learning_rate=2e-6 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
</jax> |
|
</frameworkcontent> |
|
|
|
## Finetuning with LoRA |
|
|
|
You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, on DreamBooth. For more details, take a look at the [LoRA training](./lora#dreambooth) guide. |
|
|
|
## Saving checkpoints while training |
|
|
|
It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the training process. One of the intermediate checkpoints might actually work better than the final model |
|
|
|
|
|
|
|
|
|
|
|
for example, `checkpoint-1500` would be a checkpoint saved after 1500 training steps. |
|
|
|
### Resume training from a saved checkpoint |
|
|
|
If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` to the script and specify the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last saved checkpoint (the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps: |
|
|
|
```bash |
|
--resume_from_checkpoint="checkpoint-1500" |
|
``` |
|
|
|
This is a good opportunity to tweak some of your hyperparameters if you wish. |
|
|
|
### Inference from a saved checkpoint |
|
|
|
Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders, and learning rate. |
|
|
|
If you have **`"accelerate>=0.16.0"`** installed, use the following code to run |
|
inference from an intermediate checkpoint. |
|
|
|
```python |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel |
|
from transformers import CLIPTextModel |
|
import torch |
|
|
|
# Load the pipeline with the same arguments (model, revision) that were used for training |
|
model_id = "CompVis/stable-diffusion-v1-4" |
|
|
|
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet") |
|
|
|
# if you have trained with `--args.train_text_encoder` make sure to also load the text encoder |
|
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder") |
|
|
|
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16) |
|
pipeline.to("cuda") |
|
|
|
# Perform inference, or save, or push to the hub |
|
pipeline.save_pretrained("dreambooth-pipeline") |
|
``` |
|
|
|
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: |
|
|
|
```python |
|
from accelerate import Accelerator |
|
from diffusers import DiffusionPipeline |
|
|
|
# Load the pipeline with the same arguments (model, revision) that were used for training |
|
model_id = "CompVis/stable-diffusion-v1-4" |
|
pipeline = DiffusionPipeline.from_pretrained(model_id) |
|
|
|
accelerator = Accelerator() |
|
|
|
# Use text_encoder if `--train_text_encoder` was used for the initial training |
|
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder) |
|
|
|
# Restore state from a checkpoint path. You have to use the absolute path here. |
|
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100") |
|
|
|
# Rebuild the pipeline with the unwrapped models (assignment to .unet and .text_encoder should work too) |
|
pipeline = DiffusionPipeline.from_pretrained( |
|
model_id, |
|
unet=accelerator.unwrap_model(unet), |
|
text_encoder=accelerator.unwrap_model(text_encoder), |
|
) |
|
|
|
# Perform inference, or save, or push to the hub |
|
pipeline.save_pretrained("dreambooth-pipeline") |
|
``` |
|
|
|
## Optimizations for different GPU sizes |
|
|
|
Depending on your hardware, there are a few different ways to optimize DreamBooth on GPUs from 16GB to just 8GB |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=2 --gradient_checkpointing \ |
|
--use_8bit_adam \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
### 12GB GPU |
|
|
|
To run DreamBooth on a 12GB GPU, you'll need to enable gradient checkpointing, the 8-bit optimizer, xFormers, and set the gradients to `None`: |
|
|
|
```bash |
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 --gradient_checkpointing \ |
|
--use_8bit_adam \ |
|
--enable_xformers_memory_efficient_attention \ |
|
--set_grads_to_none \ |
|
--learning_rate=2e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
### 8 GB GPU |
|
|
|
For 8GB GPUs, you'll need the help of [DeepSpeed](https: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--sample_batch_size=1 \ |
|
--gradient_accumulation_steps=1 --gradient_checkpointing \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 \ |
|
--mixed_precision=fp16 |
|
``` |
|
|
|
## Inference |
|
|
|
Once you have trained a model, specify the path to where the model is saved, and use it for inference in the [`StableDiffusionPipeline`]. Make sure your prompts include the special `identifier` used during training (`sks` in the previous examples). |
|
|
|
If you have **`"accelerate>=0.16.0"`** installed, you can use the following code to run |
|
inference from an intermediate checkpoint: |
|
|
|
```python |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
model_id = "path_to_saved_model" |
|
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "A photo of sks dog in a bucket" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] |
|
|
|
image.save("dog-bucket.png") |
|
``` |
|
|
|
You may also run inference from any of the [saved training checkpoints](#inference-from-a-saved-checkpoint). |
|
|