File size: 3,252 Bytes
3a0df25 d8d5e14 3a0df25 a6c1a67 ed45378 a6c1a67 3a0df25 a6c1a67 d8d5e14 a6c1a67 d8d5e14 a6c1a67 d8d5e14 a6c1a67 d8d5e14 a6c1a67 d8d5e14 a6c1a67 d8d5e14 a6c1a67 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
base_model: stable-diffusion-v1-5/stable-diffusion-v1-5
library_name: diffusers
license: creativeml-openrail-m
inference: true
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- diffusers-training
- lora
datasets:
- lambdalabs/naruto-blip-captions
---
<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->
# LoRA text2image fine-tuning - Bhaskar009/SD_1.5_LoRA
These are LoRA adaption weights for stable-diffusion-v1-5/stable-diffusion-v1-5. The weights were fine-tuned on the lambdalabs/naruto-blip-captions dataset. You can find some example images in the following.




## Intended uses & limitations
#### How to use
```python
import torch
import matplotlib.pyplot as plt
from diffusers import DiffusionPipeline
# Load the model and move it to GPU (CUDA)
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5").to("cuda")
# Load the fine-tuned LoRA weights
pipe.load_lora_weights("Bhaskar009/SD_1.5_LoRA")
# moving to cuda
pipe.to("cuda")
# Define a Naruto-themed prompt
prompt = "A detailed anime-style portrait of Naruto Uzumaki, wearing his Hokage cloak, standing under a bright sunset, ultra-detailed, cinematic lighting, 8K"
# Generate the image
image = pipe(prompt).images[0]
# Display the image using matplotlib
plt.figure(figsize=(6, 6))
plt.imshow(image)
plt.axis("off") # Hide axes for a clean view
plt.show()
```
#### Limitations and bias
[TODO: provide examples of latent issues and potential remediations]
## Training details - Stable Diffusion LoRA
# Dataset
-The model was trained using the 'lambdalabs/naruto-blip-captions' dataset.
-This dataset consists of Naruto character images with BLIP-generated captions.
-It provides a diverse set of characters, poses, and backgrounds,
-making it suitable for fine-tuning Stable Diffusion on anime-style images.
# Model
-Base Model: Stable Diffusion v1.5 (stable-diffusion-v1-5/stable-diffusion-v1-5)
-Fine-tuning Method: LoRA (Low-Rank Adaptation)
-Purpose: Specializing Stable Diffusion to generate Naruto-style anime characters.
# Preprocessing
- Images were resized to 512x512 resolution.
- Center cropping was applied to maintain aspect ratio.
- Random flipping was used as a data augmentation technique.
# Training Configuration
-Batch Size: 1
-Gradient Accumulation Steps: 4 # Simulates a larger batch size
-Gradient Checkpointing: Enabled # Reduces memory consumption
-Max Training Steps: 800
-Learning Rate: 1e-5 (constant schedule, no warmup)
-Max Gradient Norm: 1 # Prevents gradient explosion
-Memory Optimization: xFormers enabled for efficient attention computation
# Validation
- A validation prompt "A Naruto character" was used.
- 4 validation images were generated during training.
- Model checkpoints were saved every 500 steps.
# Model Output
- The fine-tuned LoRA model was saved to "sd-naruto-model".
- The model was pushed to the Hugging Face Hub:
- Repository: Bhaskar009/SD_1.5_LoRA
|