guiyrt commited on
Commit
b93e79a
·
1 Parent(s): 9c8af3f

Updated README

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +109 -0
  3. teasers/0.png +3 -0
  4. teasers/1.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -3,4 +3,113 @@ license: other
3
  license_name: stabilityai-ai-community
4
  license_link: >-
5
  https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: stabilityai-ai-community
4
  license_link: >-
5
  https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
+ pipeline_tag: text-to-image
10
+ tags:
11
+ - Text-to-Image
12
+ - IP-Adapter
13
+ - StableDiffusion3Pipeline
14
+ - image-generation
15
+ - Stable Diffusion
16
+ base_model:
17
+ - stabilityai/stable-diffusion-3.5-large
18
  ---
19
+
20
+ # SD3.5-Large-IP-Adapter
21
+ This repository contains the checkpoints for the diffusers implementation of [InstantX/SD3.5-Large-IP-Adapter](https://huggingface.co/InstantX/SD3.5-Large-IP-Adapter), an IP-Adapter for SD3.5-Large model released by researchers from [InstantX Team](https://huggingface.co/InstantX), where image work just like text, so it may not be responsive or interfere with other text, but we do hope you enjoy this model, have fun and share your creative works with us [on Twitter](https://x.com/instantx_ai).
22
+
23
+ # Model Card
24
+ This is a regular IP-Adapter, where the new layers are added into all 38 blocks. We use [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) to encode image for its superior performance, and adopt a TimeResampler to project. The image token number is set to 64.
25
+
26
+ # Showcases
27
+
28
+ <div class="container">
29
+ <img src="./teasers/0.png" width="1024"/>
30
+ <img src="./teasers/1.png" width="1024"/>
31
+ </div>
32
+
33
+ # Inference
34
+ The code has not been integrated into diffusers yet, please use our local files at this moment.
35
+ ```python
36
+ import torch
37
+ from PIL import Image
38
+
39
+ from diffusers import StableDiffusion3Pipeline
40
+ from transformers import SiglipVisionModel, SiglipImageProcessor
41
+
42
+ model_path = "stabilityai/stable-diffusion-3.5-large"
43
+ image_encoder_path = "google/siglip-so400m-patch14-384"
44
+ ip_adapter_path = "guiyrt/InstantX-SD3.5-Large-IP-Adapter-diffusers"
45
+
46
+ feature_extractor = SiglipImageProcessor.from_pretrained(
47
+ image_encoder_path, torch_dtype=torch.bfloat16
48
+ )
49
+
50
+ image_encoder = SiglipVisionModel.from_pretrained(
51
+ image_encoder_path, torch_dtype=torch.bfloat16
52
+ )
53
+
54
+ pipe = StableDiffusion3Pipeline.from_pretrained(
55
+ model_path,
56
+ torch_dtype=torch.bfloat16,
57
+ feature_extractor=feature_extractor,
58
+ image_encoder=image_encoder,
59
+ ).to(torch.device("cuda"))
60
+ pipe.load_ip_adapter(ip_adapter_path)
61
+
62
+ ref_img = Image.open("image.jpg").convert('RGB')
63
+
64
+ # please note that SD3.5 Large is sensitive to highres generation like 1536x1536
65
+ image = pipe(
66
+ width=1024,
67
+ height=1024,
68
+ prompt="a cat",
69
+ negative_prompt="lowres, low quality, worst quality",
70
+ num_inference_steps=24,
71
+ guidance_scale=5.0,
72
+ generator=torch.manual_seed(42),
73
+ ip_adapter_image=ref_img
74
+ ).images[0]
75
+
76
+ image.save("result.jpg")
77
+ ```
78
+
79
+ # GPU Memory Constrains
80
+
81
+ If you run out of GPU memory, you can use sequential CPU offloading (should work even with 8GB GPUs, assuming enough system RAM). It comes at the cost of longer inference time, as the parameters are only copied to the GPU strictly when required, but the output is exactly the same as using a larger GPU that fits the entire pipeline in memory. Refer to [Memory Optimisations for SD3](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_3#memory-optimisations-for-sd3) for additional methods on how to reduce GPU memory usage, such as removing or using a quantized version of the T5-XXL text encoder.
82
+
83
+ To use sequential CPU offloading, instantiate the pipeline as such instead:
84
+
85
+ ```python
86
+ pipe = StableDiffusion3Pipeline.from_pretrained(
87
+ model_path,
88
+ torch_dtype=torch.bfloat16,
89
+ feature_extractor=feature_extractor,
90
+ image_encoder=image_encoder,
91
+ )
92
+ pipe.load_ip_adapter(ip_adapter_path)
93
+ pipe._exclude_from_cpu_offload.append("image_encoder")
94
+ pipe.enable_sequential_cpu_offload()
95
+ ```
96
+
97
+ # Community ComfyUI Support
98
+ Please refer to [Slickytail/ComfyUI-InstantX-IPAdapter-SD3](https://github.com/Slickytail/ComfyUI-InstantX-IPAdapter-SD3).
99
+
100
+
101
+ # License
102
+ The model is released under [stabilityai-ai-community](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md). All copyright reserved.
103
+
104
+ # Acknowledgements
105
+ This project is sponsored by [HuggingFace](https://huggingface.co/) and [fal.ai](https://fal.ai/). Thanks to [Slickytail](https://github.com/Slickytail) for supporting ComfyUI node.
106
+
107
+ # Citation
108
+ If you find this project useful in your research, please cite us via
109
+ ```
110
+ @misc{sd35-large-ipa,
111
+ author = {InstantX Team},
112
+ title = {InstantX SD3.5-Large IP-Adapter Page},
113
+ year = {2024},
114
+ }
115
+ ```
teasers/0.png ADDED

Git LFS Details

  • SHA256: 6325e12735c57a61449fc94330d6e1e744977994bedff1fe6a2f37588d0a448e
  • Pointer size: 132 Bytes
  • Size of remote file: 5.2 MB
teasers/1.png ADDED

Git LFS Details

  • SHA256: 6bdca1eae51d34f587bea5cc218e861f1c88678c9f69d12ba1931fbcc567e9db
  • Pointer size: 132 Bytes
  • Size of remote file: 5.32 MB