|
< |
|
|
|
you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--train_text_encoder \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--use_lora \ |
|
--lora_r 16 \ |
|
--lora_alpha 27 \ |
|
--lora_text_encoder_r 16 \ |
|
--lora_text_encoder_alpha 17 \ |
|
--learning_rate=1e-4 \ |
|
--gradient_accumulation_steps=1 \ |
|
--gradient_checkpointing \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
## Inference with a single adapter |
|
|
|
To run inference with the fine-tuned model, first specify the base model with which the fine-tuned LoRA weights will be combined: |
|
|
|
```python |
|
import os |
|
import torch |
|
|
|
from diffusers import StableDiffusionPipeline |
|
from peft import PeftModel, LoraConfig |
|
|
|
MODEL_NAME = "CompVis/stable-diffusion-v1-4" |
|
``` |
|
|
|
Next, add a function that will create a Stable Diffusion pipeline for image generation. It will combine the weights of |
|
the base model with the fine-tuned LoRA weights using `LoraConfig`. |
|
|
|
```python |
|
def get_lora_sd_pipeline( |
|
ckpt_dir, base_model_name_or_path=None, dtype=torch.float16, device="cuda", adapter_name="default" |
|
): |
|
unet_sub_dir = os.path.join(ckpt_dir, "unet") |
|
text_encoder_sub_dir = os.path.join(ckpt_dir, "text_encoder") |
|
if os.path.exists(text_encoder_sub_dir) and base_model_name_or_path is None: |
|
config = LoraConfig.from_pretrained(text_encoder_sub_dir) |
|
base_model_name_or_path = config.base_model_name_or_path |
|
|
|
if base_model_name_or_path is None: |
|
raise ValueError("Please specify the base model name or path") |
|
|
|
pipe = StableDiffusionPipeline.from_pretrained(base_model_name_or_path, torch_dtype=dtype).to(device) |
|
pipe.unet = PeftModel.from_pretrained(pipe.unet, unet_sub_dir, adapter_name=adapter_name) |
|
|
|
if os.path.exists(text_encoder_sub_dir): |
|
pipe.text_encoder = PeftModel.from_pretrained( |
|
pipe.text_encoder, text_encoder_sub_dir, adapter_name=adapter_name |
|
) |
|
|
|
if dtype in (torch.float16, torch.bfloat16): |
|
pipe.unet.half() |
|
pipe.text_encoder.half() |
|
|
|
pipe.to(device) |
|
return pipe |
|
``` |
|
|
|
Now you can use the function above to create a Stable Diffusion pipeline using the LoRA weights that you have created during the fine-tuning step. |
|
Note, if you're running inference on the same machine, the path you specify here will be the same as `OUTPUT_DIR`. |
|
|
|
```python |
|
pipe = get_lora_sd_pipeline(Path("path-to-saved-model"), adapter_name="dog") |
|
``` |
|
|
|
Once you have the pipeline with your fine-tuned model, you can use it to generate images: |
|
|
|
```python |
|
prompt = "sks dog playing fetch in the park" |
|
negative_prompt = "low quality, blurry, unfinished" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0] |
|
image.save("DESTINATION_PATH_FOR_THE_IMAGE") |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_dreambooth_dog_park.png" alt="Generated image of a dog in a park"/> |
|
</div> |
|
|
|
|
|
## Multi-adapter inference |
|
|
|
With PEFT you can combine multiple adapters for inference. In the previous example you have fine-tuned Stable Diffusion on |
|
some dog images. The pipeline created based on these weights got a name - `adapter_name="dog`. Now, suppose you also fine-tuned |
|
this base model on images of a crochet toy. Let's see how we can use both adapters. |
|
|
|
First, you'll need to perform all the steps as in the single adapter inference example: |
|
|
|
1. Specify the base model. |
|
2. Add a function that creates a Stable Diffusion pipeline for image generation uses LoRA weights. |
|
3. Create a `pipe` with `adapter_name="dog"` based on the model fine-tuned on dog images. |
|
|
|
Next, you're going to need a few more helper functions. |
|
To load another adapter, create a `load_adapter()` function that leverages `load_adapter()` method of `PeftModel` (e.g. `pipe.unet.load_adapter(peft_model_path, adapter_name)`): |
|
|
|
```python |
|
def load_adapter(pipe, ckpt_dir, adapter_name): |
|
unet_sub_dir = os.path.join(ckpt_dir, "unet") |
|
text_encoder_sub_dir = os.path.join(ckpt_dir, "text_encoder") |
|
pipe.unet.load_adapter(unet_sub_dir, adapter_name=adapter_name) |
|
if os.path.exists(text_encoder_sub_dir): |
|
pipe.text_encoder.load_adapter(text_encoder_sub_dir, adapter_name=adapter_name) |
|
``` |
|
|
|
To switch between adapters, write a function that uses `set_adapter()` method of `PeftModel` (see `pipe.unet.set_adapter(adapter_name)`) |
|
|
|
```python |
|
def set_adapter(pipe, adapter_name): |
|
pipe.unet.set_adapter(adapter_name) |
|
if isinstance(pipe.text_encoder, PeftModel): |
|
pipe.text_encoder.set_adapter(adapter_name) |
|
``` |
|
|
|
Finally, add a function to create weighted LoRA adapter. |
|
|
|
```python |
|
def create_weighted_lora_adapter(pipe, adapters, weights, adapter_name="default"): |
|
pipe.unet.add_weighted_adapter(adapters, weights, adapter_name) |
|
if isinstance(pipe.text_encoder, PeftModel): |
|
pipe.text_encoder.add_weighted_adapter(adapters, weights, adapter_name) |
|
|
|
return pipe |
|
``` |
|
|
|
Let's load the second adapter from the model fine-tuned on images of a crochet toy, and give it a unique name: |
|
|
|
```python |
|
load_adapter(pipe, Path("path-to-the-second-saved-model"), adapter_name="crochet") |
|
``` |
|
|
|
Create a pipeline using weighted adapters: |
|
|
|
```python |
|
pipe = create_weighted_lora_adapter(pipe, ["crochet", "dog"], [1.0, 1.05], adapter_name="crochet_dog") |
|
``` |
|
|
|
Now you can switch between adapters. If you'd like to generate more dog images, set the adapter to `"dog"`: |
|
|
|
```python |
|
set_adapter(pipe, adapter_name="dog") |
|
prompt = "sks dog in a supermarket isle" |
|
negative_prompt = "low quality, blurry, unfinished" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_dreambooth_dog_supermarket.png" alt="Generated image of a dog in a supermarket"/> |
|
</div> |
|
|
|
In the same way, you can switch to the second adapter: |
|
|
|
```python |
|
set_adapter(pipe, adapter_name="crochet") |
|
prompt = "a fish rendered in the style of <1>" |
|
negative_prompt = "low quality, blurry, unfinished" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_dreambooth_fish.png" alt="Generated image of a crochet fish"/> |
|
</div> |
|
|
|
Finally, you can use combined weighted adapters: |
|
|
|
```python |
|
set_adapter(pipe, adapter_name="crochet_dog") |
|
prompt = "sks dog rendered in the style of <1>, close up portrait, 4K HD" |
|
negative_prompt = "low quality, blurry, unfinished" |
|
image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_dreambooth_crochet_dog.png" alt="Generated image of a crochet dog"/> |
|
</div> |
|
|
|
|
|
|
|
|