File size: 9,434 Bytes
29dc7d4 dacc000 29dc7d4 dacc000 f0ce12c dacc000 f0ce12c dacc000 f0ce12c 29dc7d4 ab9565b 29dc7d4 dacc000 ab9565b 29dc7d4 ab9565b f0ce12c ab9565b f0ce12c ab9565b 29dc7d4 ab9565b f0ce12c ab9565b dfd426c ab9565b dfd426c ab9565b dfd426c ab9565b f0ce12c ab9565b 29dc7d4 ab9565b 66edd87 ab9565b 66edd87 ab9565b 66edd87 ab9565b 66edd87 ab9565b 66edd87 ab9565b 29dc7d4 ab9565b 13c5c28 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
base_model: black-forest-labs/FLUX.1-dev
datasets: TIGER-Lab/OmniEdit-Filtered-1.2M
library_name: diffusers
license: other
inference: true
tags:
- flux
- flux-diffusers
- text-to-image
- diffusers
- control
- diffusers-training
widget:
- text: Give this the look of a traditional Japanese woodblock print.
output:
url: >-
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_car.jpg
- text: transform the setting to a winter scene
output:
url: >-
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_green_creature.jpg
- text: turn the color of mushroom to gray
output:
url: >-
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_mushroom.jpg
- text: Change it to look like it's in the style of an impasto painting.
output:
url: >-
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_norte_dam.jpg
---
# Flux Edit
<Gallery />
These are the control weights trained on [black-forest-labs/FLUX.1-dev](htpss://hf.co/black-forest-labs/FLUX.1-dev)
and [TIGER-Lab/OmniEdit-Filtered-1.2M](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M) for image editing. We use the
[Flux Control framework](https://blackforestlabs.ai/flux-1-tools/) for fine-tuning.
## License
Please adhere to the licensing terms as described [here](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
## Intended uses & limitations
### Inference
```py
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
import torch
path = "sayakpaul/FLUX.1-dev-edit-v0"
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)
prompt = "turn the color of mushroom to gray"
image = pipeline(
control_image=image,
prompt=prompt,
guidance_scale=30., # change this as needed.
num_inference_steps=50, # change this as needed.
max_sequence_length=512,
height=image.height,
width=image.width,
generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")
```
### Speeding inference with a turbo LoRA
We can speed up the inference by reducing the `num_inference_steps` to produce a nice image by using turbo LoRA like [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
Make sure to install `peft` before running the code below: `pip install -U peft`.
<details>
<summary>Code</summary>
```py
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch
path = "sayakpaul/FLUX.1-dev-edit-v0"
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")
# load the turbo LoRA
pipeline.load_lora_weights(
hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125])
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)
prompt = "turn the color of mushroom to gray"
image = pipeline(
control_image=image,
prompt=prompt,
guidance_scale=30., # change this as needed.
num_inference_steps=8, # change this as needed.
max_sequence_length=512,
height=image.height,
width=image.width,
generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")
```
</details>
<br>
<details>
<summary>Comparison</summary>
<table align="center">
<tr>
<th>50 steps</th>
<th>8 steps</th>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_car.jpg" alt="50 steps 1" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_car.jpg" alt="8 steps 1" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_norte_dam.jpg" alt="50 steps 2" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_norte_dam.jpg" alt="8 steps 2" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_mushroom.jpg" alt="50 steps 3" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_mushroom.jpg" alt="8 steps 3" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_green_creature.jpg" alt="50 steps 4" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_green_creature.jpg" alt="8 steps 4" width="150"></td>
</tr>
</table>
</details>
You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the [Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/quantization/overview) to learn more.
`guidance_scale` also impacts the results:
<table align="center">
<tr>
<th>Prompt</th>
<th>Collage (gs: 10)</th>
<th>Collage (gs: 20)</th>
<th>Collage (gs: 30)</th>
<th>Collage (gs: 40)</th>
</tr>
<tr>
<td align="center">
<em>Give this the look of a traditional Japanese woodblock print.</em>
</td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_0.png" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_1.png" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_2.png" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_3.png" alt="Edited Image gs 40"></td>
</tr>
<tr>
<td align="center">
<em>transform the setting to a winter scene</em>
</td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_4.png" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_5.png" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_5.png" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_6.png" alt="Edited Image gs 40"></td>
</tr>
<tr>
<td align="center">
<em>turn the color of mushroom to gray</em>
</td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_12.png" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_13.png" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_14.png" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/sayakpaul/FLUX.1-dev-edit-v0/resolve/main/images_15.png" alt="Edited Image gs 40"></td>
</tr>
</table>
### Limitations and bias
Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.
## Training details
Fine-tuning codebase is [here](https://github.com/sayakpaul/flux-image-editing). Training hyperparameters:
* Per GPU batch size: 4
* Gradient accumulation steps: 4
* Guidance scale: 30
* BF16 mixed-precision
* AdamW optimizer (8bit from `bitsandbytes`)
* Constant learning rate of 5e-5
* Weight decay of 1e-6
* 20000 training steps
Training was conducted using a node of 8xH100s.
We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:
```py
sigmas = torch.rand(batch_size)
timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
...
noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise
```
where `pixel_latents` is computed from the source images and `noise` is drawn from a Gaussian distribution. For more details, [check out
the repository](https://github.com/sayakpaul/flux-image-editing/blob/b041f62df8f959dc3b2f324d2bfdcdf3a6388598/train.py#L403). |