|
<!--Copyright 2023 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# Kandinsky |
|
|
|
[[open-in-colab]] |
|
|
|
Kandinsky ๋ชจ๋ธ์ ์ผ๋ จ์ ๋ค๊ตญ์ด text-to-image ์์ฑ ๋ชจ๋ธ์
๋๋ค. Kandinsky 2.0 ๋ชจ๋ธ์ ๋ ๊ฐ์ ๋ค๊ตญ์ด ํ
์คํธ ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ฐ๊ฒฐํด UNet์ ์ฌ์ฉ๋ฉ๋๋ค. |
|
|
|
[Kandinsky 2.1](../api/pipelines/kandinsky)์ ํ
์คํธ์ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ ๊ฐ์ ๋งคํ์ ์์ฑํ๋ image prior ๋ชจ๋ธ([`CLIP`](https://huggingface.co/docs/transformers/model_doc/clip))์ ํฌํจํ๋๋ก ์ํคํ
์ฒ๋ฅผ ๋ณ๊ฒฝํ์ต๋๋ค. ์ด ๋งคํ์ ๋ ๋์ text-image alignment๋ฅผ ์ ๊ณตํ๋ฉฐ, ํ์ต ์ค์ ํ
์คํธ ์๋ฒ ๋ฉ๊ณผ ํจ๊ป ์ฌ์ฉ๋์ด ๋ ๋์ ํ์ง์ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ์ ธ์ต๋๋ค. ๋ง์ง๋ง์ผ๋ก, Kandinsky 2.1์ spatial conditional ์ ๊ทํ ๋ ์ด์ด๋ฅผ ์ถ๊ฐํ์ฌ ์ฌ์ค๊ฐ์ ๋์ฌ์ฃผ๋ [Modulating Quantized Vectors (MoVQ)](https://huggingface.co/papers/2209.09002) ๋์ฝ๋๋ฅผ ์ฌ์ฉํ์ฌ latents๋ฅผ ์ด๋ฏธ์ง๋ก ๋์ฝ๋ฉํฉ๋๋ค. |
|
|
|
[Kandinsky 2.2](../api/pipelines/kandinsky_v22)๋ image prior ๋ชจ๋ธ์ ์ด๋ฏธ์ง ์ธ์ฝ๋๋ฅผ ๋ ํฐ CLIP-ViT-G ๋ชจ๋ธ๋ก ๊ต์ฒดํ์ฌ ํ์ง์ ๊ฐ์ ํจ์ผ๋ก์จ ์ด์ ๋ชจ๋ธ์ ๊ฐ์ ํ์ต๋๋ค. ๋ํ image prior ๋ชจ๋ธ์ ํด์๋์ ์ข
ํก๋น๊ฐ ๋ค๋ฅธ ์ด๋ฏธ์ง๋ก ์ฌํ๋ จ๋์ด ๋ ๋์ ํด์๋์ ์ด๋ฏธ์ง์ ๋ค์ํ ์ด๋ฏธ์ง ํฌ๊ธฐ๋ฅผ ์์ฑํฉ๋๋ค. |
|
|
|
[Kandinsky 3](../api/pipelines/kandinsky3)๋ ์ํคํ
์ฒ๋ฅผ ๋จ์ํํ๊ณ prior ๋ชจ๋ธ๊ณผ diffusion ๋ชจ๋ธ์ ํฌํจํ๋ 2๋จ๊ณ ์์ฑ ํ๋ก์ธ์ค์์ ๋ฒ์ด๋๊ณ ์์ต๋๋ค. ๋์ , Kandinsky 3๋ [Flan-UL2](https://huggingface.co/google/flan-ul2)๋ฅผ ์ฌ์ฉํ์ฌ ํ
์คํธ๋ฅผ ์ธ์ฝ๋ฉํ๊ณ , [BigGan-deep](https://hf.co/papers/1809.11096) ๋ธ๋ก์ด ํฌํจ๋ UNet์ ์ฌ์ฉํ๋ฉฐ, [Sber-MoVQGAN](https://github.com/ai-forever/MoVQGAN)์ ์ฌ์ฉํ์ฌ latents๋ฅผ ์ด๋ฏธ์ง๋ก ๋์ฝ๋ฉํฉ๋๋ค. ํ
์คํธ ์ดํด์ ์์ฑ๋ ์ด๋ฏธ์ง ํ์ง์ ์ฃผ๋ก ๋ ํฐ ํ
์คํธ ์ธ์ฝ๋์ UNet์ ์ฌ์ฉํจ์ผ๋ก์จ ๋ฌ์ฑ๋ฉ๋๋ค. |
|
|
|
์ด ๊ฐ์ด๋์์๋ text-to-image, image-to-image, ์ธํ์ธํ
, ๋ณด๊ฐ ๋ฑ์ ์ํด Kandinsky ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค. |
|
|
|
์์ํ๊ธฐ ์ ์ ๋ค์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์: |
|
|
|
```py |
|
# Colab์์ ํ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ค์นํ๊ธฐ ์ํด ์ฃผ์์ ์ ์ธํ์ธ์ |
|
#!pip install -q diffusers transformers accelerate |
|
``` |
|
|
|
<Tip warning={true}> |
|
|
|
Kandinsky 2.1๊ณผ 2.2์ ์ฌ์ฉ๋ฒ์ ๋งค์ฐ ์ ์ฌํฉ๋๋ค! ์ ์ผํ ์ฐจ์ด์ ์ Kandinsky 2.2๋ latents๋ฅผ ๋์ฝ๋ฉํ ๋ `ํ๋กฌํํธ`๋ฅผ ์
๋ ฅ์ผ๋ก ๋ฐ์ง ์๋๋ค๋ ๊ฒ์
๋๋ค. ๋์ , Kandinsky 2.2๋ ๋์ฝ๋ฉ ์ค์๋ `image_embeds`๋ง ๋ฐ์๋ค์
๋๋ค. |
|
|
|
<br> |
|
|
|
Kandinsky 3๋ ๋ ๊ฐ๊ฒฐํ ์ํคํ
์ฒ๋ฅผ ๊ฐ์ง๊ณ ์์ผ๋ฉฐ prior ๋ชจ๋ธ์ด ํ์ํ์ง ์์ต๋๋ค. ์ฆ, [Stable Diffusion XL](sdxl)๊ณผ ๊ฐ์ ๋ค๋ฅธ diffusion ๋ชจ๋ธ๊ณผ ์ฌ์ฉ๋ฒ์ด ๋์ผํฉ๋๋ค. |
|
|
|
</Tip> |
|
|
|
## Text-to-image |
|
|
|
๋ชจ๋ ์์
์ Kandinsky ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ค๋ฉด ํญ์ ํ๋กฌํํธ๋ฅผ ์ธ์ฝ๋ฉํ๊ณ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ์์ฑํ๋ prior ํ์ดํ๋ผ์ธ์ ์ค์ ํ๋ ๊ฒ๋ถํฐ ์์ํด์ผ ํฉ๋๋ค. ์ด์ ํ์ดํ๋ผ์ธ์ negative ํ๋กฌํํธ `""`์ ํด๋นํ๋ `negative_image_embeds`๋ ์์ฑํฉ๋๋ค. ๋ ๋์ ๊ฒฐ๊ณผ๋ฅผ ์ป์ผ๋ ค๋ฉด ์ด์ ํ์ดํ๋ผ์ธ์ ์ค์ `negative_prompt`๋ฅผ ์ ๋ฌํ ์ ์์ง๋ง, ์ด๋ ๊ฒ ํ๋ฉด prior ํ์ดํ๋ผ์ธ์ ์ ํจ ๋ฐฐ์น ํฌ๊ธฐ๊ฐ 2๋ฐฐ๋ก ์ฆ๊ฐํฉ๋๋ค. |
|
|
|
<hfoptions id="text-to-image"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers import KandinskyPriorPipeline, KandinskyPipeline |
|
import torch |
|
|
|
prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16).to("cuda") |
|
pipeline = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" |
|
negative_prompt = "low quality, bad quality" # negative ํ๋กฌํํธ ํฌํจ์ ์ ํ์ ์ด์ง๋ง, ๋ณดํต ๊ฒฐ๊ณผ๋ ๋ ์ข์ต๋๋ค |
|
image_embeds, negative_image_embeds = prior_pipeline(prompt, negative_prompt, guidance_scale=1.0).to_tuple() |
|
``` |
|
|
|
์ด์ ๋ชจ๋ ํ๋กฌํํธ์ ์๋ฒ ๋ฉ์ [`KandinskyPipeline`]์ ์ ๋ฌํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
image = pipeline(prompt, image_embeds=image_embeds, negative_prompt=negative_prompt, negative_image_embeds=negative_image_embeds, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/cheeseburger.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline |
|
import torch |
|
|
|
prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16).to("cuda") |
|
pipeline = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" |
|
negative_prompt = "low quality, bad quality" # negative ํ๋กฌํํธ ํฌํจ์ ์ ํ์ ์ด์ง๋ง, ๋ณดํต ๊ฒฐ๊ณผ๋ ๋ ์ข์ต๋๋ค |
|
image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple() |
|
``` |
|
|
|
์ด๋ฏธ์ง ์์ฑ์ ์ํด `image_embeds`์ `negative_image_embeds`๋ฅผ [`KandinskyV22Pipeline`]์ ์ ๋ฌํฉ๋๋ค: |
|
|
|
```py |
|
image = pipeline(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-text-to-image.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 3"> |
|
|
|
Kandinsky 3๋ prior ๋ชจ๋ธ์ด ํ์ํ์ง ์์ผ๋ฏ๋ก [`Kandinsky3Pipeline`]์ ์ง์ ๋ถ๋ฌ์ค๊ณ ์ด๋ฏธ์ง ์์ฑ ํ๋กฌํํธ๋ฅผ ์ ๋ฌํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
from diffusers import Kandinsky3Pipeline |
|
import torch |
|
|
|
pipeline = Kandinsky3Pipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) |
|
pipeline.enable_model_cpu_offload() |
|
|
|
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" |
|
image = pipeline(prompt).images[0] |
|
image |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
๐ค Diffusers๋ ๋ํ [`KandinskyCombinedPipeline`] ๋ฐ [`KandinskyV22CombinedPipeline`]์ด ํฌํจ๋ end-to-end API๋ฅผ ์ ๊ณตํ๋ฏ๋ก prior ํ์ดํ๋ผ์ธ๊ณผ text-to-image ๋ณํ ํ์ดํ๋ผ์ธ์ ๋ณ๋๋ก ๋ถ๋ฌ์ฌ ํ์๊ฐ ์์ต๋๋ค. ๊ฒฐํฉ๋ ํ์ดํ๋ผ์ธ์ prior ๋ชจ๋ธ๊ณผ ๋์ฝ๋๋ฅผ ๋ชจ๋ ์๋์ผ๋ก ๋ถ๋ฌ์ต๋๋ค. ์ํ๋ ๊ฒฝ์ฐ `prior_guidance_scale` ๋ฐ `prior_num_inference_steps` ๋งค๊ฐ ๋ณ์๋ฅผ ์ฌ์ฉํ์ฌ prior ํ์ดํ๋ผ์ธ์ ๋ํด ๋ค๋ฅธ ๊ฐ์ ์ค์ ํ ์ ์์ต๋๋ค. |
|
|
|
๋ด๋ถ์์ ๊ฒฐํฉ๋ ํ์ดํ๋ผ์ธ์ ์๋์ผ๋ก ํธ์ถํ๋ ค๋ฉด [`AutoPipelineForText2Image`]๋ฅผ ์ฌ์ฉํฉ๋๋ค: |
|
|
|
<hfoptions id="text-to-image"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) |
|
pipeline.enable_model_cpu_offload() |
|
|
|
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" |
|
negative_prompt = "low quality, bad quality" |
|
|
|
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale=1.0, guidance_scale=4.0, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers import AutoPipelineForText2Image |
|
import torch |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) |
|
pipeline.enable_model_cpu_offload() |
|
|
|
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" |
|
negative_prompt = "low quality, bad quality" |
|
|
|
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale=1.0, guidance_scale=4.0, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## Image-to-image |
|
|
|
Image-to-image ๊ฒฝ์ฐ, ์ด๊ธฐ ์ด๋ฏธ์ง์ ํ
์คํธ ํ๋กฌํํธ๋ฅผ ์ ๋ฌํ์ฌ ํ์ดํ๋ผ์ธ์ ์ด๋ฏธ์ง๋ฅผ conditioningํฉ๋๋ค. Prior ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค: |
|
|
|
<hfoptions id="image-to-image"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
import torch |
|
from diffusers import KandinskyImg2ImgPipeline, KandinskyPriorPipeline |
|
|
|
prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
pipeline = KandinskyImg2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
import torch |
|
from diffusers import KandinskyV22Img2ImgPipeline, KandinskyPriorPipeline |
|
|
|
prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
pipeline = KandinskyV22Img2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 3"> |
|
|
|
Kandinsky 3๋ prior ๋ชจ๋ธ์ด ํ์ํ์ง ์์ผ๋ฏ๋ก image-to-image ํ์ดํ๋ผ์ธ์ ์ง์ ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
from diffusers import Kandinsky3Img2ImgPipeline |
|
from diffusers.utils import load_image |
|
import torch |
|
|
|
pipeline = Kandinsky3Img2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) |
|
pipeline.enable_model_cpu_offload() |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
Conditioningํ ์ด๋ฏธ์ง๋ฅผ ๋ค์ด๋ก๋ํฉ๋๋ค: |
|
|
|
```py |
|
from diffusers.utils import load_image |
|
|
|
# ์ด๋ฏธ์ง ๋ค์ด๋ก๋ |
|
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" |
|
original_image = load_image(url) |
|
original_image = original_image.resize((768, 512)) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"/> |
|
</div> |
|
|
|
Prior ํ์ดํ๋ผ์ธ์ผ๋ก `image_embeds`์ `negative_image_embeds`๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
prompt = "A fantasy landscape, Cinematic lighting" |
|
negative_prompt = "low quality, bad quality" |
|
|
|
image_embeds, negative_image_embeds = prior_pipeline(prompt, negative_prompt).to_tuple() |
|
``` |
|
|
|
์ด์ ์๋ณธ ์ด๋ฏธ์ง์ ๋ชจ๋ ํ๋กฌํํธ ๋ฐ ์๋ฒ ๋ฉ์ ํ์ดํ๋ผ์ธ์ผ๋ก ์ ๋ฌํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
<hfoptions id="image-to-image"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers.utils import make_image_grid |
|
|
|
image = pipeline(prompt, negative_prompt=negative_prompt, image=original_image, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, strength=0.3).images[0] |
|
make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/img2img_fantasyland.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers.utils import make_image_grid |
|
|
|
image = pipeline(image=original_image, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, strength=0.3).images[0] |
|
make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-image-to-image.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 3"> |
|
|
|
```py |
|
image = pipeline(prompt, negative_prompt=negative_prompt, image=image, strength=0.75, num_inference_steps=25).images[0] |
|
image |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
๋ํ ๐ค Diffusers์์๋ [`KandinskyImg2ImgCombinedPipeline`] ๋ฐ [`KandinskyV22Img2ImgCombinedPipeline`]์ด ํฌํจ๋ end-to-end API๋ฅผ ์ ๊ณตํ๋ฏ๋ก prior ํ์ดํ๋ผ์ธ๊ณผ image-to-image ํ์ดํ๋ผ์ธ์ ๋ณ๋๋ก ๋ถ๋ฌ์ฌ ํ์๊ฐ ์์ต๋๋ค. ๊ฒฐํฉ๋ ํ์ดํ๋ผ์ธ์ prior ๋ชจ๋ธ๊ณผ ๋์ฝ๋๋ฅผ ๋ชจ๋ ์๋์ผ๋ก ๋ถ๋ฌ์ต๋๋ค. ์ํ๋ ๊ฒฝ์ฐ `prior_guidance_scale` ๋ฐ `prior_num_inference_steps` ๋งค๊ฐ ๋ณ์๋ฅผ ์ฌ์ฉํ์ฌ ์ด์ ํ์ดํ๋ผ์ธ์ ๋ํด ๋ค๋ฅธ ๊ฐ์ ์ค์ ํ ์ ์์ต๋๋ค. |
|
|
|
๋ด๋ถ์์ ๊ฒฐํฉ๋ ํ์ดํ๋ผ์ธ์ ์๋์ผ๋ก ํธ์ถํ๋ ค๋ฉด [`AutoPipelineForImage2Image`]๋ฅผ ์ฌ์ฉํฉ๋๋ค: |
|
|
|
<hfoptions id="image-to-image"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers import AutoPipelineForImage2Image |
|
from diffusers.utils import make_image_grid, load_image |
|
import torch |
|
|
|
pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True) |
|
pipeline.enable_model_cpu_offload() |
|
|
|
prompt = "A fantasy landscape, Cinematic lighting" |
|
negative_prompt = "low quality, bad quality" |
|
|
|
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" |
|
original_image = load_image(url) |
|
|
|
original_image.thumbnail((768, 768)) |
|
|
|
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=original_image, strength=0.3).images[0] |
|
make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2) |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers import AutoPipelineForImage2Image |
|
from diffusers.utils import make_image_grid, load_image |
|
import torch |
|
|
|
pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) |
|
pipeline.enable_model_cpu_offload() |
|
|
|
prompt = "A fantasy landscape, Cinematic lighting" |
|
negative_prompt = "low quality, bad quality" |
|
|
|
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" |
|
original_image = load_image(url) |
|
|
|
original_image.thumbnail((768, 768)) |
|
|
|
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=original_image, strength=0.3).images[0] |
|
make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2) |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## Inpainting |
|
|
|
<Tip warning={true}> |
|
|
|
โ ๏ธ Kandinsky ๋ชจ๋ธ์ ์ด์ ๊ฒ์์ ํฝ์
๋์ โฌ๏ธ **ํฐ์ ํฝ์
**์ ์ฌ์ฉํ์ฌ ๋ง์คํฌ ์์ญ์ ํํํฉ๋๋ค. ํ๋ก๋์
์์ [`KandinskyInpaintPipeline`]์ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ ํฐ์ ํฝ์
์ ์ฌ์ฉํ๋๋ก ๋ง์คํฌ๋ฅผ ๋ณ๊ฒฝํด์ผ ํฉ๋๋ค: |
|
|
|
```py |
|
# PIL ์
๋ ฅ์ ๋ํด |
|
import PIL.ImageOps |
|
mask = PIL.ImageOps.invert(mask) |
|
|
|
# PyTorch์ NumPy ์
๋ ฅ์ ๋ํด |
|
mask = 1 - mask |
|
``` |
|
|
|
</Tip> |
|
|
|
์ธํ์ธํ
์์๋ ์๋ณธ ์ด๋ฏธ์ง, ์๋ณธ ์ด๋ฏธ์ง์์ ๋์ฒดํ ์์ญ์ ๋ง์คํฌ, ์ธํ์ธํ
ํ ๋ด์ฉ์ ๋ํ ํ
์คํธ ํ๋กฌํํธ๊ฐ ํ์ํฉ๋๋ค. Prior ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค: |
|
|
|
<hfoptions id="inpaint"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers import KandinskyInpaintPipeline, KandinskyPriorPipeline |
|
from diffusers.utils import load_image, make_image_grid |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
|
|
prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
pipeline = KandinskyInpaintPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers import KandinskyV22InpaintPipeline, KandinskyV22PriorPipeline |
|
from diffusers.utils import load_image, make_image_grid |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
|
|
prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
pipeline = KandinskyV22InpaintPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
์ด๊ธฐ ์ด๋ฏธ์ง๋ฅผ ๋ถ๋ฌ์ค๊ณ ๋ง์คํฌ๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") |
|
mask = np.zeros((768, 768), dtype=np.float32) |
|
# mask area above cat's head |
|
mask[:250, 250:-250] = 1 |
|
``` |
|
|
|
Prior ํ์ดํ๋ผ์ธ์ผ๋ก ์๋ฒ ๋ฉ์ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
prompt = "a hat" |
|
prior_output = prior_pipeline(prompt) |
|
``` |
|
|
|
์ด์ ์ด๋ฏธ์ง ์์ฑ์ ์ํด ์ด๊ธฐ ์ด๋ฏธ์ง, ๋ง์คํฌ, ํ๋กฌํํธ์ ์๋ฒ ๋ฉ์ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํฉ๋๋ค: |
|
|
|
<hfoptions id="inpaint"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
output_image = pipeline(prompt, image=init_image, mask_image=mask, **prior_output, height=768, width=768, num_inference_steps=150).images[0] |
|
mask = Image.fromarray((mask*255).astype('uint8'), 'L') |
|
make_image_grid([init_image, mask, output_image], rows=1, cols=3) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/inpaint_cat_hat.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
output_image = pipeline(image=init_image, mask_image=mask, **prior_output, height=768, width=768, num_inference_steps=150).images[0] |
|
mask = Image.fromarray((mask*255).astype('uint8'), 'L') |
|
make_image_grid([init_image, mask, output_image], rows=1, cols=3) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinskyv22-inpaint.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
[`KandinskyInpaintCombinedPipeline`] ๋ฐ [`KandinskyV22InpaintCombinedPipeline`]์ ์ฌ์ฉํ์ฌ ๋ด๋ถ์์ prior ๋ฐ ๋์ฝ๋ ํ์ดํ๋ผ์ธ์ ํจ๊ป ํธ์ถํ ์ ์์ต๋๋ค. ์ด๋ฅผ ์ํด [`AutoPipelineForInpainting`]์ ์ฌ์ฉํฉ๋๋ค: |
|
|
|
<hfoptions id="inpaint"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
from diffusers import AutoPipelineForInpainting |
|
from diffusers.utils import load_image, make_image_grid |
|
|
|
pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16) |
|
pipe.enable_model_cpu_offload() |
|
|
|
init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") |
|
mask = np.zeros((768, 768), dtype=np.float32) |
|
# ๊ณ ์์ด ๋จธ๋ฆฌ ์ ๋ง์คํฌ ์ง์ญ |
|
mask[:250, 250:-250] = 1 |
|
prompt = "a hat" |
|
|
|
output_image = pipe(prompt=prompt, image=init_image, mask_image=mask).images[0] |
|
mask = Image.fromarray((mask*255).astype('uint8'), 'L') |
|
make_image_grid([init_image, mask, output_image], rows=1, cols=3) |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
from diffusers import AutoPipelineForInpainting |
|
from diffusers.utils import load_image, make_image_grid |
|
|
|
pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16) |
|
pipe.enable_model_cpu_offload() |
|
|
|
init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") |
|
mask = np.zeros((768, 768), dtype=np.float32) |
|
# ๊ณ ์์ด ๋จธ๋ฆฌ ์ ๋ง์คํฌ ์์ญ |
|
mask[:250, 250:-250] = 1 |
|
prompt = "a hat" |
|
|
|
output_image = pipe(prompt=prompt, image=original_image, mask_image=mask).images[0] |
|
mask = Image.fromarray((mask*255).astype('uint8'), 'L') |
|
make_image_grid([init_image, mask, output_image], rows=1, cols=3) |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## Interpolation (๋ณด๊ฐ) |
|
|
|
Interpolation(๋ณด๊ฐ)์ ์ฌ์ฉํ๋ฉด ์ด๋ฏธ์ง์ ํ
์คํธ ์๋ฒ ๋ฉ ์ฌ์ด์ latent space๋ฅผ ํ์ํ ์ ์์ด prior ๋ชจ๋ธ์ ์ค๊ฐ ๊ฒฐ๊ณผ๋ฌผ์ ๋ณผ ์ ์๋ ๋ฉ์ง ๋ฐฉ๋ฒ์
๋๋ค. Prior ํ์ดํ๋ผ์ธ๊ณผ ๋ณด๊ฐํ๋ ค๋ ๋ ๊ฐ์ ์ด๋ฏธ์ง๋ฅผ ๋ถ๋ฌ์ต๋๋ค: |
|
|
|
<hfoptions id="interpolate"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
from diffusers import KandinskyPriorPipeline, KandinskyPipeline |
|
from diffusers.utils import load_image, make_image_grid |
|
import torch |
|
|
|
prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
img_1 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") |
|
img_2 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/starry_night.jpeg") |
|
make_image_grid([img_1.resize((512,512)), img_2.resize((512,512))], rows=1, cols=2) |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline |
|
from diffusers.utils import load_image, make_image_grid |
|
import torch |
|
|
|
prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
img_1 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") |
|
img_2 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/starry_night.jpeg") |
|
make_image_grid([img_1.resize((512,512)), img_2.resize((512,512))], rows=1, cols=2) |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
<div class="flex gap-4"> |
|
<div> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png"/> |
|
<figcaption class="mt-2 text-center text-sm text-gray-500">a cat</figcaption> |
|
</div> |
|
<div> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/starry_night.jpeg"/> |
|
<figcaption class="mt-2 text-center text-sm text-gray-500">Van Gogh's Starry Night painting</figcaption> |
|
</div> |
|
</div> |
|
|
|
๋ณด๊ฐํ ํ
์คํธ ๋๋ ์ด๋ฏธ์ง๋ฅผ ์ง์ ํ๊ณ ๊ฐ ํ
์คํธ ๋๋ ์ด๋ฏธ์ง์ ๋ํ ๊ฐ์ค์น๋ฅผ ์ค์ ํฉ๋๋ค. ๊ฐ์ค์น๋ฅผ ์คํํ์ฌ ๋ณด๊ฐ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๋์ง ํ์ธํ์ธ์! |
|
|
|
```py |
|
images_texts = ["a cat", img_1, img_2] |
|
weights = [0.3, 0.3, 0.4] |
|
``` |
|
|
|
`interpolate` ํจ์๋ฅผ ํธ์ถํ์ฌ ์๋ฒ ๋ฉ์ ์์ฑํ ๋ค์, ํ์ดํ๋ผ์ธ์ผ๋ก ์ ๋ฌํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
<hfoptions id="interpolate"> |
|
<hfoption id="Kandinsky 2.1"> |
|
|
|
```py |
|
# ํ๋กฌํํธ๋ ๋น์นธ์ผ๋ก ๋จ๊ฒจ๋ ๋ฉ๋๋ค |
|
prompt = "" |
|
prior_out = prior_pipeline.interpolate(images_texts, weights) |
|
|
|
pipeline = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
|
|
image = pipeline(prompt, **prior_out, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/starry_cat.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
<hfoption id="Kandinsky 2.2"> |
|
|
|
```py |
|
# ํ๋กฌํํธ๋ ๋น์นธ์ผ๋ก ๋จ๊ฒจ๋ ๋ฉ๋๋ค |
|
prompt = "" |
|
prior_out = prior_pipeline.interpolate(images_texts, weights) |
|
|
|
pipeline = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
|
|
image = pipeline(prompt, **prior_out, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinskyv22-interpolate.png"/> |
|
</div> |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## ControlNet |
|
|
|
<Tip warning={true}> |
|
|
|
โ ๏ธ ControlNet์ Kandinsky 2.2์์๋ง ์ง์๋ฉ๋๋ค! |
|
|
|
</Tip> |
|
|
|
ControlNet์ ์ฌ์ฉํ๋ฉด depth map์ด๋ edge detection์ ๊ฐ์ ์ถ๊ฐ ์
๋ ฅ์ ํตํด ์ฌ์ ํ์ต๋ large diffusion ๋ชจ๋ธ์ conditioningํ ์ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, ๋ชจ๋ธ์ด depth map์ ๊ตฌ์กฐ๋ฅผ ์ดํดํ๊ณ ๋ณด์กดํ ์ ์๋๋ก ๊น์ด ๋งต์ผ๋ก Kandinsky 2.2๋ฅผ conditioningํ ์ ์์ต๋๋ค. |
|
|
|
์ด๋ฏธ์ง๋ฅผ ๋ถ๋ฌ์ค๊ณ depth map์ ์ถ์ถํด ๋ณด๊ฒ ์ต๋๋ค: |
|
|
|
```py |
|
from diffusers.utils import load_image |
|
|
|
img = load_image( |
|
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png" |
|
).resize((768, 768)) |
|
img |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png"/> |
|
</div> |
|
|
|
๊ทธ๋ฐ ๋ค์ ๐ค Transformers์ `depth-estimation` [`~transformers.Pipeline`]์ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์ฒ๋ฆฌํด depth map์ ๊ตฌํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
import torch |
|
import numpy as np |
|
|
|
from transformers import pipeline |
|
|
|
def make_hint(image, depth_estimator): |
|
image = depth_estimator(image)["depth"] |
|
image = np.array(image) |
|
image = image[:, :, None] |
|
image = np.concatenate([image, image, image], axis=2) |
|
detected_map = torch.from_numpy(image).float() / 255.0 |
|
hint = detected_map.permute(2, 0, 1) |
|
return hint |
|
|
|
depth_estimator = pipeline("depth-estimation") |
|
hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda") |
|
``` |
|
|
|
### Text-to-image [[controlnet-text-to-image]] |
|
|
|
Prior ํ์ดํ๋ผ์ธ๊ณผ [`KandinskyV22ControlnetPipeline`]๋ฅผ ๋ถ๋ฌ์ต๋๋ค: |
|
|
|
```py |
|
from diffusers import KandinskyV22PriorPipeline, KandinskyV22ControlnetPipeline |
|
|
|
prior_pipeline = KandinskyV22PriorPipeline.from_pretrained( |
|
"kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True |
|
).to("cuda") |
|
|
|
pipeline = KandinskyV22ControlnetPipeline.from_pretrained( |
|
"kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16 |
|
).to("cuda") |
|
``` |
|
|
|
ํ๋กฌํํธ์ negative ํ๋กฌํํธ๋ก ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
prompt = "A robot, 4k photo" |
|
negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature" |
|
|
|
generator = torch.Generator(device="cuda").manual_seed(43) |
|
|
|
image_emb, zero_image_emb = prior_pipeline( |
|
prompt=prompt, negative_prompt=negative_prior_prompt, generator=generator |
|
).to_tuple() |
|
``` |
|
|
|
๋ง์ง๋ง์ผ๋ก ์ด๋ฏธ์ง ์๋ฒ ๋ฉ๊ณผ depth ์ด๋ฏธ์ง๋ฅผ [`KandinskyV22ControlnetPipeline`]์ ์ ๋ฌํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
image = pipeline(image_embeds=image_emb, negative_image_embeds=zero_image_emb, hint=hint, num_inference_steps=50, generator=generator, height=768, width=768).images[0] |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/robot_cat_text2img.png"/> |
|
</div> |
|
|
|
### Image-to-image [[controlnet-image-to-image]] |
|
|
|
ControlNet์ ์ฌ์ฉํ image-to-image์ ๊ฒฝ์ฐ, ๋ค์์ ์ฌ์ฉํ ํ์๊ฐ ์์ต๋๋ค: |
|
|
|
- [`KandinskyV22PriorEmb2EmbPipeline`]๋ก ํ
์คํธ ํ๋กฌํํธ์ ์ด๋ฏธ์ง์์ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ์์ฑํฉ๋๋ค. |
|
- [`KandinskyV22ControlnetImg2ImgPipeline`]๋ก ์ด๊ธฐ ์ด๋ฏธ์ง์ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค. |
|
|
|
๐ค Transformers์์ `depth-estimation` [`~transformers.Pipeline`]์ ์ฌ์ฉํ์ฌ ๊ณ ์์ด์ ์ด๊ธฐ ์ด๋ฏธ์ง์ depth map์ ์ฒ๋ฆฌํด ์ถ์ถํฉ๋๋ค: |
|
|
|
```py |
|
import torch |
|
import numpy as np |
|
|
|
from diffusers import KandinskyV22PriorEmb2EmbPipeline, KandinskyV22ControlnetImg2ImgPipeline |
|
from diffusers.utils import load_image |
|
from transformers import pipeline |
|
|
|
img = load_image( |
|
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png" |
|
).resize((768, 768)) |
|
|
|
def make_hint(image, depth_estimator): |
|
image = depth_estimator(image)["depth"] |
|
image = np.array(image) |
|
image = image[:, :, None] |
|
image = np.concatenate([image, image, image], axis=2) |
|
detected_map = torch.from_numpy(image).float() / 255.0 |
|
hint = detected_map.permute(2, 0, 1) |
|
return hint |
|
|
|
depth_estimator = pipeline("depth-estimation") |
|
hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda") |
|
``` |
|
|
|
Prior ํ์ดํ๋ผ์ธ๊ณผ [`KandinskyV22ControlnetImg2ImgPipeline`]์ ๋ถ๋ฌ์ต๋๋ค: |
|
|
|
```py |
|
prior_pipeline = KandinskyV22PriorEmb2EmbPipeline.from_pretrained( |
|
"kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True |
|
).to("cuda") |
|
|
|
pipeline = KandinskyV22ControlnetImg2ImgPipeline.from_pretrained( |
|
"kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16 |
|
).to("cuda") |
|
``` |
|
|
|
ํ
์คํธ ํ๋กฌํํธ์ ์ด๊ธฐ ์ด๋ฏธ์ง๋ฅผ ์ด์ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํ์ฌ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ ์์ฑํฉ๋๋ค: |
|
|
|
```py |
|
prompt = "A robot, 4k photo" |
|
negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature" |
|
|
|
generator = torch.Generator(device="cuda").manual_seed(43) |
|
|
|
img_emb = prior_pipeline(prompt=prompt, image=img, strength=0.85, generator=generator) |
|
negative_emb = prior_pipeline(prompt=negative_prior_prompt, image=img, strength=1, generator=generator) |
|
``` |
|
|
|
์ด์ [`KandinskyV22ControlnetImg2ImgPipeline`]์ ์คํํ์ฌ ์ด๊ธฐ ์ด๋ฏธ์ง์ ์ด๋ฏธ์ง ์๋ฒ ๋ฉ์ผ๋ก๋ถํฐ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
image = pipeline(image=img, strength=0.5, image_embeds=img_emb.image_embeds, negative_image_embeds=negative_emb.image_embeds, hint=hint, num_inference_steps=50, generator=generator, height=768, width=768).images[0] |
|
make_image_grid([img.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2) |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/robot_cat.png"/> |
|
</div> |
|
|
|
## ์ต์ ํ |
|
|
|
Kandinsky๋ mapping์ ์์ฑํ๊ธฐ ์ํ prior ํ์ดํ๋ผ์ธ๊ณผ latents๋ฅผ ์ด๋ฏธ์ง๋ก ๋์ฝ๋ฉํ๊ธฐ ์ํ ๋ ๋ฒ์งธ ํ์ดํ๋ผ์ธ์ด ํ์ํ๋ค๋ ์ ์์ ๋
ํนํฉ๋๋ค. ๋๋ถ๋ถ์ ๊ณ์ฐ์ด ๋ ๋ฒ์งธ ํ์ดํ๋ผ์ธ์์ ์ด๋ฃจ์ด์ง๋ฏ๋ก ์ต์ ํ์ ๋
ธ๋ ฅ์ ๋ ๋ฒ์งธ ํ์ดํ๋ผ์ธ์ ์ง์ค๋์ด์ผ ํฉ๋๋ค. ๋ค์์ ์ถ๋ก ์ค Kandinskyํค๋ฅผ ๊ฐ์ ํ๊ธฐ ์ํ ๋ช ๊ฐ์ง ํ์
๋๋ค. |
|
|
|
1. PyTorch < 2.0์ ์ฌ์ฉํ ๊ฒฝ์ฐ [xFormers](../optimization/xformers)์ ํ์ฑํํฉ๋๋ค. |
|
|
|
```diff |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) |
|
+ pipe.enable_xformers_memory_efficient_attention() |
|
``` |
|
|
|
2. PyTorch >= 2.0์ ์ฌ์ฉํ ๊ฒฝ์ฐ `torch.compile`์ ํ์ฑํํ์ฌ scaled dot-product attention (SDPA)๋ฅผ ์๋์ผ๋ก ์ฌ์ฉํ๋๋ก ํฉ๋๋ค: |
|
|
|
```diff |
|
pipe.unet.to(memory_format=torch.channels_last) |
|
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) |
|
``` |
|
|
|
์ด๋ attention processor๋ฅผ ๋ช
์์ ์ผ๋ก [`~models.attention_processor.AttnAddedKVProcessor2_0`]์ ์ฌ์ฉํ๋๋ก ์ค์ ํ๋ ๊ฒ๊ณผ ๋์ผํฉ๋๋ค: |
|
|
|
```py |
|
from diffusers.models.attention_processor import AttnAddedKVProcessor2_0 |
|
|
|
pipe.unet.set_attn_processor(AttnAddedKVProcessor2_0()) |
|
``` |
|
|
|
3. ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ ์ค๋ฅ๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด [`~KandinskyPriorPipeline.enable_model_cpu_offload`]๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ CPU๋ก ์คํ๋ก๋ํฉ๋๋ค: |
|
|
|
```diff |
|
from diffusers import DiffusionPipeline |
|
import torch |
|
|
|
pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) |
|
+ pipe.enable_model_cpu_offload() |
|
``` |
|
|
|
4. ๊ธฐ๋ณธ์ ์ผ๋ก text-to-image ํ์ดํ๋ผ์ธ์ [`DDIMScheduler`]๋ฅผ ์ฌ์ฉํ์ง๋ง, [`DDPMScheduler`]์ ๊ฐ์ ๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ก ๋์ฒดํ์ฌ ์ถ๋ก ์๋์ ์ด๋ฏธ์ง ํ์ง ๊ฐ์ ๊ท ํ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๋์ง ํ์ธํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
from diffusers import DDPMScheduler |
|
from diffusers import DiffusionPipeline |
|
|
|
scheduler = DDPMScheduler.from_pretrained("kandinsky-community/kandinsky-2-1", subfolder="ddpm_scheduler") |
|
pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", scheduler=scheduler, torch_dtype=torch.float16, use_safetensors=True).to("cuda") |
|
``` |
|
|