Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.25.2
InstructPix2Pix text-to-edit-image fine-tuning
This extended LoRA training script was authored by Aiden-Frost. This is an experimental LoRA extension of this example. This script provides further support add LoRA layers for unet model.
Running locally with PyTorch
Installing the dependencies
Before running the scripts, make sure to install the library's training dependencies:
Important
To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
Then cd in the example folder and run
pip install -r requirements.txt
And initialize an 🤗Accelerate environment with:
accelerate config
Note also that we use PEFT library as backend for LoRA training, make sure to have peft>=0.6.0
installed in your environment.
Training script example
export MODEL_ID="timbrooks/instruct-pix2pix"
export DATASET_ID="instruction-tuning-sd/cartoonization"
export OUTPUT_DIR="instructPix2Pix-cartoonization"
accelerate launch train_instruct_pix2pix_lora.py \
--pretrained_model_name_or_path=$MODEL_ID \
--dataset_name=$DATASET_ID \
--enable_xformers_memory_efficient_attention \
--resolution=256 --random_flip \
--train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=15000 \
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --lr_warmup_steps=0 \
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
--validation_prompt="Generate a cartoonized version of the natural image" \
--seed=42 \
--rank=4 \
--output_dir=$OUTPUT_DIR \
--report_to=wandb \
--push_to_hub \
--original_image_column="original_image" \
--edited_image_column="cartoonized_image" \
--edit_prompt_column="edit_prompt"
Inference
After training the model and the lora weight of the model is stored in the $OUTPUT_DIR
.
# load the base model pipeline
pipe_lora = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix")
# Load LoRA weights from the provided path
output_dir = "path/to/lora_weight_directory"
pipe_lora.unet.load_attn_procs(output_dir)
input_image_path = "/path/to/input_image"
input_image = Image.open(input_image_path)
edited_images = pipe_lora(num_images_per_prompt=1, prompt=args.edit_prompt, image=input_image, num_inference_steps=1000).images
edited_images[0].show()
Results
Here is an example of using the script to train a instructpix2pix model. Trained on google colab T4 GPU
MODEL_ID="timbrooks/instruct-pix2pix"
DATASET_ID="instruction-tuning-sd/cartoonization"
TRAIN_EPOCHS=100
Below are few examples for given the input image, edit_prompt and the edited_image (output of the model)
Here are some rough statistics about the training model using this script
References
- InstructPix2Pix - https://github.com/timothybrooks/instruct-pix2pix
- Dataset and example training script - https://huggingface.co/blog/instruction-tuning-sd
- For more information about the project - https://github.com/Aiden-Frost/Efficiently-teaching-counting-and-cartoonization-to-InstructPix2Pix.-