|
--- |
|
base_model: stable-diffusion-v1-5/stable-diffusion-v1-5 |
|
library_name: diffusers |
|
license: creativeml-openrail-m |
|
inference: true |
|
tags: |
|
- stable-diffusion |
|
- stable-diffusion-diffusers |
|
- text-to-image |
|
- diffusers |
|
- diffusers-training |
|
- lora |
|
datasets: |
|
- lambdalabs/naruto-blip-captions |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the training script had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# LoRA text2image fine-tuning - Bhaskar009/SD_1.5_LoRA |
|
These are LoRA adaption weights for stable-diffusion-v1-5/stable-diffusion-v1-5. The weights were fine-tuned on the lambdalabs/naruto-blip-captions dataset. You can find some example images in the following. |
|
|
|
 |
|
 |
|
 |
|
 |
|
|
|
|
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
```python |
|
import torch |
|
import matplotlib.pyplot as plt |
|
from diffusers import DiffusionPipeline |
|
|
|
# Load the model and move it to GPU (CUDA) |
|
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5").to("cuda") |
|
|
|
# Load the fine-tuned LoRA weights |
|
pipe.load_lora_weights("Bhaskar009/SD_1.5_LoRA") |
|
|
|
# moving to cuda |
|
pipe.to("cuda") |
|
|
|
# Define a Naruto-themed prompt |
|
prompt = "A detailed anime-style portrait of Naruto Uzumaki, wearing his Hokage cloak, standing under a bright sunset, ultra-detailed, cinematic lighting, 8K" |
|
|
|
# Generate the image |
|
image = pipe(prompt).images[0] |
|
|
|
# Display the image using matplotlib |
|
plt.figure(figsize=(6, 6)) |
|
plt.imshow(image) |
|
plt.axis("off") # Hide axes for a clean view |
|
plt.show() |
|
|
|
``` |
|
|
|
#### Limitations and bias |
|
|
|
[TODO: provide examples of latent issues and potential remediations] |
|
|
|
## Training details - Stable Diffusion LoRA |
|
|
|
# Dataset |
|
|
|
-The model was trained using the 'lambdalabs/naruto-blip-captions' dataset. |
|
-This dataset consists of Naruto character images with BLIP-generated captions. |
|
-It provides a diverse set of characters, poses, and backgrounds, |
|
-making it suitable for fine-tuning Stable Diffusion on anime-style images. |
|
|
|
# Model |
|
|
|
-Base Model: Stable Diffusion v1.5 (stable-diffusion-v1-5/stable-diffusion-v1-5) |
|
-Fine-tuning Method: LoRA (Low-Rank Adaptation) |
|
-Purpose: Specializing Stable Diffusion to generate Naruto-style anime characters. |
|
|
|
# Preprocessing |
|
|
|
- Images were resized to 512x512 resolution. |
|
- Center cropping was applied to maintain aspect ratio. |
|
- Random flipping was used as a data augmentation technique. |
|
|
|
# Training Configuration |
|
|
|
-Batch Size: 1 |
|
-Gradient Accumulation Steps: 4 # Simulates a larger batch size |
|
-Gradient Checkpointing: Enabled # Reduces memory consumption |
|
-Max Training Steps: 800 |
|
-Learning Rate: 1e-5 (constant schedule, no warmup) |
|
-Max Gradient Norm: 1 # Prevents gradient explosion |
|
-Memory Optimization: xFormers enabled for efficient attention computation |
|
|
|
# Validation |
|
|
|
- A validation prompt "A Naruto character" was used. |
|
- 4 validation images were generated during training. |
|
- Model checkpoints were saved every 500 steps. |
|
|
|
# Model Output |
|
|
|
- The fine-tuned LoRA model was saved to "sd-naruto-model". |
|
- The model was pushed to the Hugging Face Hub: |
|
- Repository: Bhaskar009/SD_1.5_LoRA |
|
|