File size: 3,252 Bytes
3a0df25
 
 
 
 
 
 
 
 
 
 
 
d8d5e14
 
3a0df25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6c1a67
 
 
 
 
 
 
 
 
 
ed45378
 
 
a6c1a67
 
 
 
 
 
 
 
 
 
 
 
3a0df25
 
 
 
 
 
a6c1a67
d8d5e14
 
a6c1a67
 
 
 
 
d8d5e14
 
a6c1a67
 
 
 
d8d5e14
 
a6c1a67
 
 
 
d8d5e14
 
a6c1a67
 
 
 
 
 
 
 
d8d5e14
 
a6c1a67
 
 
 
d8d5e14
 
a6c1a67
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
base_model: stable-diffusion-v1-5/stable-diffusion-v1-5
library_name: diffusers
license: creativeml-openrail-m
inference: true
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- diffusers-training
- lora
datasets:
- lambdalabs/naruto-blip-captions
---

<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->


# LoRA text2image fine-tuning - Bhaskar009/SD_1.5_LoRA
These are LoRA adaption weights for stable-diffusion-v1-5/stable-diffusion-v1-5. The weights were fine-tuned on the lambdalabs/naruto-blip-captions dataset. You can find some example images in the following. 

![img_0](./image_0.png)
![img_1](./image_1.png)
![img_2](./image_2.png)
![img_3](./image_3.png)



## Intended uses & limitations

#### How to use

```python
import torch
import matplotlib.pyplot as plt
from diffusers import DiffusionPipeline

# Load the model and move it to GPU (CUDA)
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5").to("cuda")

# Load the fine-tuned LoRA weights
pipe.load_lora_weights("Bhaskar009/SD_1.5_LoRA")

# moving to cuda
pipe.to("cuda")

# Define a Naruto-themed prompt
prompt = "A detailed anime-style portrait of Naruto Uzumaki, wearing his Hokage cloak, standing under a bright sunset, ultra-detailed, cinematic lighting, 8K"

# Generate the image
image = pipe(prompt).images[0]

# Display the image using matplotlib
plt.figure(figsize=(6, 6))
plt.imshow(image)
plt.axis("off")  # Hide axes for a clean view
plt.show()

```

#### Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

## Training details - Stable Diffusion LoRA

# Dataset

  -The model was trained using the 'lambdalabs/naruto-blip-captions' dataset.
  -This dataset consists of Naruto character images with BLIP-generated captions.
  -It provides a diverse set of characters, poses, and backgrounds, 
  -making it suitable for fine-tuning Stable Diffusion on anime-style images.

# Model

  -Base Model: Stable Diffusion v1.5 (stable-diffusion-v1-5/stable-diffusion-v1-5)
  -Fine-tuning Method: LoRA (Low-Rank Adaptation)
  -Purpose: Specializing Stable Diffusion to generate Naruto-style anime characters.

# Preprocessing

  - Images were resized to 512x512 resolution.
  - Center cropping was applied to maintain aspect ratio.
  - Random flipping was used as a data augmentation technique.

# Training Configuration

  -Batch Size: 1
  -Gradient Accumulation Steps: 4  # Simulates a larger batch size
  -Gradient Checkpointing: Enabled  # Reduces memory consumption
  -Max Training Steps: 800
  -Learning Rate: 1e-5 (constant schedule, no warmup)
  -Max Gradient Norm: 1  # Prevents gradient explosion
  -Memory Optimization: xFormers enabled for efficient attention computation

# Validation

 - A validation prompt "A Naruto character" was used.
 - 4 validation images were generated during training.
 - Model checkpoints were saved every 500 steps.

# Model Output

  - The fine-tuned LoRA model was saved to "sd-naruto-model".
  - The model was pushed to the Hugging Face Hub:
  - Repository: Bhaskar009/SD_1.5_LoRA