NadaGh's picture
End of training
dde5d93 verified
|
raw
history blame
7.77 kB
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Shap-E
[[open-in-colab]]
Shap-E๋Š” ๋น„๋””์˜ค ๊ฒŒ์ž„ ๊ฐœ๋ฐœ, ์ธํ…Œ๋ฆฌ์–ด ๋””์ž์ธ, ๊ฑด์ถ•์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” 3D ์—์…‹์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ conditional ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ 3D ์—์…‹ ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต๋˜์—ˆ๊ณ , ๊ฐ ์˜ค๋ธŒ์ ํŠธ์˜ ๋” ๋งŽ์€ ๋ทฐ๋ฅผ ๋ Œ๋”๋งํ•˜๊ณ  4K point cloud ๋Œ€์‹  16K๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ›„์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. Shap-E ๋ชจ๋ธ์€ ๋‘ ๋‹จ๊ณ„๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค:
1. ์ธ์ฝ”๋”๊ฐ€ 3D ์—์…‹์˜ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์™€ ๋ Œ๋”๋ง๋œ ๋ทฐ๋ฅผ ๋ฐ›์•„๋“ค์ด๊ณ  ์—์…‹์„ ๋‚˜ํƒ€๋‚ด๋Š” implicit functions์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
2. ์ธ์ฝ”๋”๊ฐ€ ์ƒ์„ฑํ•œ latents๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ diffusion ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜์—ฌ neural radiance fields(NeRF) ๋˜๋Š” textured 3D ๋ฉ”์‹œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ 3D ์—์…‹์„ ๋” ์‰ฝ๊ฒŒ ๋ Œ๋”๋งํ•˜๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Shap-E๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚˜๋งŒ์˜ 3D ์—์…‹์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์ž…๋‹ˆ๋‹ค!
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```py
# Colab์—์„œ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์„ ์ œ์™ธํ•˜์„ธ์š”
#!pip install -q diffusers transformers accelerate trimesh
```
## Text-to-3D
3D ๊ฐ์ฒด์˜ gif๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ [`ShapEPipeline`]์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์€ 3D ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
```py
import torch
from diffusers import ShapEPipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)
guidance_scale = 15.0
prompt = ["A firecracker", "A birthday cupcake"]
images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
```
์ด์ œ [`~utils.export_to_gif`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ๋ฅผ 3D ๊ฐ์ฒด์˜ gif๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
```py
from diffusers.utils import export_to_gif
export_to_gif(images[0], "firecracker_3d.gif")
export_to_gif(images[1], "cake_3d.gif")
```
<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">prompt = "A firecracker"</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">prompt = "A birthday cupcake"</figcaption>
</div>
</div>
## Image-to-3D
๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ 3D ๊ฐœ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด [`ShapEImg2ImgPipeline`]์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [Kandinsky 2.1](../api/pipelines/kandinsky) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
```py
from diffusers import DiffusionPipeline
import torch
prior_pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
prompt = "A cheeseburger, white background"
image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()
image = pipeline(
prompt,
image_embeds=image_embeds,
negative_image_embeds=negative_image_embeds,
).images[0]
image.save("burger.png")
```
์น˜์ฆˆ๋ฒ„๊ฑฐ๋ฅผ [`ShapEImg2ImgPipeline`]์— ์ „๋‹ฌํ•˜์—ฌ 3D representation์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
```py
from PIL import Image
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif
pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img", torch_dtype=torch.float16, variant="fp16").to("cuda")
guidance_scale = 3.0
image = Image.open("burger.png").resize((256, 256))
images = pipe(
image,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
gif_path = export_to_gif(images[0], "burger_3d.gif")
```
<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">cheeseburger</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">3D cheeseburger</figcaption>
</div>
</div>
## ๋ฉ”์‹œ ์ƒ์„ฑํ•˜๊ธฐ
Shap-E๋Š” ๋‹ค์šด์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋ Œ๋”๋งํ•  textured ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ๋Š” ์œ ์—ฐํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ์˜ˆ์ œ์—์„œ๋Š” ๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ [Dataset viewer](https://huggingface.co/docs/hub/datasets-viewer#dataset-preview)๋ฅผ ์‚ฌ์šฉํ•ด ๋ฉ”์‹œ ์‹œ๊ฐํ™”๋ฅผ ์ง€์›ํ•˜๋Š” `glb` ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
`output_type` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ `"mesh"`๋กœ ์ง€์ •ํ•จ์œผ๋กœ์จ [`ShapEPipeline`]๊ณผ [`ShapEImg2ImgPipeline`] ๋ชจ๋‘์— ๋Œ€ํ•œ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
import torch
from diffusers import ShapEPipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)
guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images
```
๋ฉ”์‹œ ์ถœ๋ ฅ์„ `ply` ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋ ค๋ฉด [`~utils.export_to_ply`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
<Tip>
์„ ํƒ์ ์œผ๋กœ [`~utils.export_to_obj`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ `obj` ํŒŒ์ผ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ˜•์‹์œผ๋กœ ๋ฉ”์‹œ ์ถœ๋ ฅ์„ ์ €์žฅํ•  ์ˆ˜ ์žˆ์–ด ๋‹ค์šด์ŠคํŠธ๋ฆผ์—์„œ ๋”์šฑ ์œ ์—ฐํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
</Tip>
```py
from diffusers.utils import export_to_ply
ply_path = export_to_ply(images[0], "3d_cake.ply")
print(f"Saved to folder: {ply_path}")
```
๊ทธ ๋‹ค์Œ trimesh ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ `ply` ํŒŒ์ผ์„ `glb` ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
import trimesh
mesh = trimesh.load("3d_cake.ply")
mesh_export = mesh.export("3d_cake.glb", file_type="glb")
```
๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”์‹œ ์ถœ๋ ฅ์€ ์•„๋ž˜์ชฝ ์‹œ์ ์— ์ดˆ์ ์ด ๋งž์ถฐ์ ธ ์žˆ์ง€๋งŒ ํšŒ์ „ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ์‹œ์ ์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
import trimesh
import numpy as np
mesh = trimesh.load("3d_cake.ply")
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
mesh = mesh.apply_transform(rot)
mesh_export = mesh.export("3d_cake.glb", file_type="glb")
```
๋ฉ”์‹œ ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐ์…‹ ๋ ˆํฌ์ง€ํ† ๋ฆฌ์— ์—…๋กœ๋“œํ•ด Dataset viewer๋กœ ์‹œ๊ฐํ™”ํ•˜์„ธ์š”!
<div class="flex justify-center">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/3D-cake.gif"/>
</div>