theSure's picture
Upload 2037 files
a49cc2f verified

A newer version of the Gradio SDK is available: 5.25.2

Upgrade

InstructPix2Pix text-to-edit-image fine-tuning

This extended LoRA training script was authored by Aiden-Frost. This is an experimental LoRA extension of this example. This script provides further support add LoRA layers for unet model.

Running locally with PyTorch

Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

Important

To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Then cd in the example folder and run

pip install -r requirements.txt

And initialize an 🤗Accelerate environment with:

accelerate config

Note also that we use PEFT library as backend for LoRA training, make sure to have peft>=0.6.0 installed in your environment.

Training script example

export MODEL_ID="timbrooks/instruct-pix2pix"
export DATASET_ID="instruction-tuning-sd/cartoonization"
export OUTPUT_DIR="instructPix2Pix-cartoonization"

accelerate launch train_instruct_pix2pix_lora.py \
  --pretrained_model_name_or_path=$MODEL_ID \
  --dataset_name=$DATASET_ID \
  --enable_xformers_memory_efficient_attention \
  --resolution=256 --random_flip \
  --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \
  --max_train_steps=15000 \
  --checkpointing_steps=5000 --checkpoints_total_limit=1 \
  --learning_rate=5e-05 --lr_warmup_steps=0 \
  --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
  --validation_prompt="Generate a cartoonized version of the natural image" \
  --seed=42 \
  --rank=4 \
  --output_dir=$OUTPUT_DIR \
  --report_to=wandb \
  --push_to_hub \
  --original_image_column="original_image" \
  --edited_image_column="cartoonized_image" \
  --edit_prompt_column="edit_prompt"

Inference

After training the model and the lora weight of the model is stored in the $OUTPUT_DIR.

# load the base model pipeline
pipe_lora = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix")

# Load LoRA weights from the provided path
output_dir = "path/to/lora_weight_directory"
pipe_lora.unet.load_attn_procs(output_dir)

input_image_path = "/path/to/input_image"
input_image = Image.open(input_image_path)
edited_images = pipe_lora(num_images_per_prompt=1, prompt=args.edit_prompt, image=input_image, num_inference_steps=1000).images
edited_images[0].show()

Results

Here is an example of using the script to train a instructpix2pix model. Trained on google colab T4 GPU

MODEL_ID="timbrooks/instruct-pix2pix"
DATASET_ID="instruction-tuning-sd/cartoonization"
TRAIN_EPOCHS=100

Below are few examples for given the input image, edit_prompt and the edited_image (output of the model)

instructpix2pix-inputs

Here are some rough statistics about the training model using this script

instructpix2pix-inputs

References