Update README.md
Browse files
README.md
CHANGED
@@ -1,118 +1,118 @@
|
|
1 |
-
---
|
2 |
-
license: other
|
3 |
-
license_name: stabilityai-ai-community
|
4 |
-
license_link: >-
|
5 |
-
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
|
6 |
-
language:
|
7 |
-
- en
|
8 |
-
library_name: diffusers
|
9 |
-
pipeline_tag: text-to-image
|
10 |
-
tags:
|
11 |
-
- Text-to-Image
|
12 |
-
- IP-Adapter
|
13 |
-
- StableDiffusion3Pipeline
|
14 |
-
- image-generation
|
15 |
-
- Stable Diffusion
|
16 |
-
base_model:
|
17 |
-
- stabilityai/stable-diffusion-3.5-large
|
18 |
-
---
|
19 |
-
|
20 |
-
# SD3.5-Large-IP-Adapter
|
21 |
-
This repository contains the checkpoints for the diffusers implementation of [InstantX/SD3.5-Large-IP-Adapter](https://huggingface.co/InstantX/SD3.5-Large-IP-Adapter), an IP-Adapter for SD3.5-Large model released by researchers from [InstantX Team](https://huggingface.co/InstantX), where image work just like text, so it may not be responsive or interfere with other text, but we do hope you enjoy this model, have fun and share your creative works with us [on Twitter](https://x.com/instantx_ai).
|
22 |
-
|
23 |
-
# Model Card
|
24 |
-
This is a regular IP-Adapter, where the new layers are added into all 38 blocks. We use [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) to encode image for its superior performance, and adopt a TimeResampler to project. The image token number is set to 64.
|
25 |
-
|
26 |
-
# Showcases
|
27 |
-
|
28 |
-
<div class="container">
|
29 |
-
<img src="./teasers/0.png" width="1024"/>
|
30 |
-
<br>
|
31 |
-
<img src="./teasers/1.png" width="1024"/>
|
32 |
-
<br>
|
33 |
-
<img src="./teasers/2.png" width="1024"/>
|
34 |
-
</div>
|
35 |
-
|
36 |
-
# Inference
|
37 |
-
|
38 |
-
```python
|
39 |
-
import torch
|
40 |
-
from PIL import Image
|
41 |
-
|
42 |
-
from diffusers import StableDiffusion3Pipeline
|
43 |
-
from transformers import SiglipVisionModel, SiglipImageProcessor
|
44 |
-
|
45 |
-
model_path = "stabilityai/stable-diffusion-3.5-large"
|
46 |
-
image_encoder_path = "google/siglip-so400m-patch14-384"
|
47 |
-
ip_adapter_path = "guiyrt/InstantX-SD3.5-Large-IP-Adapter-diffusers"
|
48 |
-
|
49 |
-
feature_extractor = SiglipImageProcessor.from_pretrained(
|
50 |
-
image_encoder_path, torch_dtype=torch.bfloat16
|
51 |
-
)
|
52 |
-
|
53 |
-
image_encoder = SiglipVisionModel.from_pretrained(
|
54 |
-
image_encoder_path, torch_dtype=torch.bfloat16
|
55 |
-
)
|
56 |
-
|
57 |
-
pipe = StableDiffusion3Pipeline.from_pretrained(
|
58 |
-
model_path,
|
59 |
-
torch_dtype=torch.bfloat16,
|
60 |
-
feature_extractor=feature_extractor,
|
61 |
-
image_encoder=image_encoder,
|
62 |
-
).to(torch.device("cuda"))
|
63 |
-
pipe.load_ip_adapter(ip_adapter_path)
|
64 |
-
|
65 |
-
ref_img = Image.open("image.jpg").convert('RGB')
|
66 |
-
|
67 |
-
# please note that SD3.5 Large is sensitive to highres generation like 1536x1536
|
68 |
-
image = pipe(
|
69 |
-
width=1024,
|
70 |
-
height=1024,
|
71 |
-
prompt="a cat",
|
72 |
-
negative_prompt="lowres, low quality, worst quality",
|
73 |
-
num_inference_steps=24,
|
74 |
-
guidance_scale=5.0,
|
75 |
-
generator=torch.manual_seed(42),
|
76 |
-
ip_adapter_image=ref_img
|
77 |
-
).images[0]
|
78 |
-
|
79 |
-
image.save("result.jpg")
|
80 |
-
```
|
81 |
-
|
82 |
-
# GPU Memory Constraints
|
83 |
-
|
84 |
-
If you run out of GPU memory, you can use sequential CPU offloading (should work even with 8GB GPUs, assuming enough system RAM). It comes at the cost of longer inference time, as the parameters are only copied to the GPU strictly when required, but the output is exactly the same as using a larger GPU that fits the entire pipeline in memory. Refer to [Memory Optimisations for SD3](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_3#memory-optimisations-for-sd3) for additional methods on how to reduce GPU memory usage, such as removing or using a quantized version of the T5-XXL text encoder.
|
85 |
-
|
86 |
-
To use sequential CPU offloading, instantiate the pipeline as such instead:
|
87 |
-
|
88 |
-
```python
|
89 |
-
pipe = StableDiffusion3Pipeline.from_pretrained(
|
90 |
-
model_path,
|
91 |
-
torch_dtype=torch.bfloat16,
|
92 |
-
feature_extractor=feature_extractor,
|
93 |
-
image_encoder=image_encoder,
|
94 |
-
)
|
95 |
-
pipe.load_ip_adapter(ip_adapter_path)
|
96 |
-
pipe._exclude_from_cpu_offload.append("image_encoder")
|
97 |
-
pipe.enable_sequential_cpu_offload()
|
98 |
-
```
|
99 |
-
|
100 |
-
# Community ComfyUI Support
|
101 |
-
Please refer to [Slickytail/ComfyUI-InstantX-IPAdapter-SD3](https://github.com/Slickytail/ComfyUI-InstantX-IPAdapter-SD3).
|
102 |
-
|
103 |
-
|
104 |
-
# License
|
105 |
-
The model is released under [stabilityai-ai-community](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md). All copyright reserved.
|
106 |
-
|
107 |
-
# Acknowledgements
|
108 |
-
This project is sponsored by [HuggingFace](https://huggingface.co/) and [fal.ai](https://fal.ai/). Thanks to [Slickytail](https://github.com/Slickytail) for supporting ComfyUI node.
|
109 |
-
|
110 |
-
# Citation
|
111 |
-
If you find this project useful in your research, please cite us via
|
112 |
-
```
|
113 |
-
@misc{sd35-large-ipa,
|
114 |
-
author = {InstantX Team},
|
115 |
-
title = {InstantX SD3.5-Large IP-Adapter Page},
|
116 |
-
year = {2024},
|
117 |
-
}
|
118 |
-
```
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: stabilityai-ai-community
|
4 |
+
license_link: >-
|
5 |
+
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
library_name: diffusers
|
9 |
+
pipeline_tag: text-to-image
|
10 |
+
tags:
|
11 |
+
- Text-to-Image
|
12 |
+
- IP-Adapter
|
13 |
+
- StableDiffusion3Pipeline
|
14 |
+
- image-generation
|
15 |
+
- Stable Diffusion
|
16 |
+
base_model:
|
17 |
+
- stabilityai/stable-diffusion-3.5-large
|
18 |
+
---
|
19 |
+
|
20 |
+
# SD3.5-Large-IP-Adapter
|
21 |
+
This repository contains the checkpoints for the diffusers implementation of [InstantX/SD3.5-Large-IP-Adapter](https://huggingface.co/InstantX/SD3.5-Large-IP-Adapter), an IP-Adapter for SD3.5-Large model released by researchers from [InstantX Team](https://huggingface.co/InstantX), where image work just like text, so it may not be responsive or interfere with other text, but we do hope you enjoy this model, have fun and share your creative works with us [on Twitter](https://x.com/instantx_ai).
|
22 |
+
|
23 |
+
# Model Card
|
24 |
+
This is a regular IP-Adapter, where the new layers are added into all 38 blocks. We use [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) to encode image for its superior performance, and adopt a TimeResampler to project. The image token number is set to 64.
|
25 |
+
|
26 |
+
# Showcases
|
27 |
+
Using prompt "a cat".
|
28 |
+
<div class="container">
|
29 |
+
<img src="./teasers/0.png" width="1024"/>
|
30 |
+
<br>
|
31 |
+
<img src="./teasers/1.png" width="1024"/>
|
32 |
+
<br>
|
33 |
+
<img src="./teasers/2.png" width="1024"/>
|
34 |
+
</div>
|
35 |
+
|
36 |
+
# Inference
|
37 |
+
|
38 |
+
```python
|
39 |
+
import torch
|
40 |
+
from PIL import Image
|
41 |
+
|
42 |
+
from diffusers import StableDiffusion3Pipeline
|
43 |
+
from transformers import SiglipVisionModel, SiglipImageProcessor
|
44 |
+
|
45 |
+
model_path = "stabilityai/stable-diffusion-3.5-large"
|
46 |
+
image_encoder_path = "google/siglip-so400m-patch14-384"
|
47 |
+
ip_adapter_path = "guiyrt/InstantX-SD3.5-Large-IP-Adapter-diffusers"
|
48 |
+
|
49 |
+
feature_extractor = SiglipImageProcessor.from_pretrained(
|
50 |
+
image_encoder_path, torch_dtype=torch.bfloat16
|
51 |
+
)
|
52 |
+
|
53 |
+
image_encoder = SiglipVisionModel.from_pretrained(
|
54 |
+
image_encoder_path, torch_dtype=torch.bfloat16
|
55 |
+
)
|
56 |
+
|
57 |
+
pipe = StableDiffusion3Pipeline.from_pretrained(
|
58 |
+
model_path,
|
59 |
+
torch_dtype=torch.bfloat16,
|
60 |
+
feature_extractor=feature_extractor,
|
61 |
+
image_encoder=image_encoder,
|
62 |
+
).to(torch.device("cuda"))
|
63 |
+
pipe.load_ip_adapter(ip_adapter_path)
|
64 |
+
|
65 |
+
ref_img = Image.open("image.jpg").convert('RGB')
|
66 |
+
|
67 |
+
# please note that SD3.5 Large is sensitive to highres generation like 1536x1536
|
68 |
+
image = pipe(
|
69 |
+
width=1024,
|
70 |
+
height=1024,
|
71 |
+
prompt="a cat",
|
72 |
+
negative_prompt="lowres, low quality, worst quality",
|
73 |
+
num_inference_steps=24,
|
74 |
+
guidance_scale=5.0,
|
75 |
+
generator=torch.manual_seed(42),
|
76 |
+
ip_adapter_image=ref_img
|
77 |
+
).images[0]
|
78 |
+
|
79 |
+
image.save("result.jpg")
|
80 |
+
```
|
81 |
+
|
82 |
+
# GPU Memory Constraints
|
83 |
+
|
84 |
+
If you run out of GPU memory, you can use sequential CPU offloading (should work even with 8GB GPUs, assuming enough system RAM). It comes at the cost of longer inference time, as the parameters are only copied to the GPU strictly when required, but the output is exactly the same as using a larger GPU that fits the entire pipeline in memory. Refer to [Memory Optimisations for SD3](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_3#memory-optimisations-for-sd3) for additional methods on how to reduce GPU memory usage, such as removing or using a quantized version of the T5-XXL text encoder.
|
85 |
+
|
86 |
+
To use sequential CPU offloading, instantiate the pipeline as such instead:
|
87 |
+
|
88 |
+
```python
|
89 |
+
pipe = StableDiffusion3Pipeline.from_pretrained(
|
90 |
+
model_path,
|
91 |
+
torch_dtype=torch.bfloat16,
|
92 |
+
feature_extractor=feature_extractor,
|
93 |
+
image_encoder=image_encoder,
|
94 |
+
)
|
95 |
+
pipe.load_ip_adapter(ip_adapter_path)
|
96 |
+
pipe._exclude_from_cpu_offload.append("image_encoder")
|
97 |
+
pipe.enable_sequential_cpu_offload()
|
98 |
+
```
|
99 |
+
|
100 |
+
# Community ComfyUI Support
|
101 |
+
Please refer to [Slickytail/ComfyUI-InstantX-IPAdapter-SD3](https://github.com/Slickytail/ComfyUI-InstantX-IPAdapter-SD3).
|
102 |
+
|
103 |
+
|
104 |
+
# License
|
105 |
+
The model is released under [stabilityai-ai-community](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md). All copyright reserved.
|
106 |
+
|
107 |
+
# Acknowledgements
|
108 |
+
This project is sponsored by [HuggingFace](https://huggingface.co/) and [fal.ai](https://fal.ai/). Thanks to [Slickytail](https://github.com/Slickytail) for supporting ComfyUI node.
|
109 |
+
|
110 |
+
# Citation
|
111 |
+
If you find this project useful in your research, please cite us via
|
112 |
+
```
|
113 |
+
@misc{sd35-large-ipa,
|
114 |
+
author = {InstantX Team},
|
115 |
+
title = {InstantX SD3.5-Large IP-Adapter Page},
|
116 |
+
year = {2024},
|
117 |
+
}
|
118 |
+
```
|