File size: 7,766 Bytes
dde5d93 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Shap-E
[[open-in-colab]]
Shap-Eλ λΉλμ€ κ²μ κ°λ°, μΈν
λ¦¬μ΄ λμμΈ, 건μΆμ μ¬μ©ν μ μλ 3D μμ
μ μμ±νκΈ° μν conditional λͺ¨λΈμ
λλ€. λκ·λͺ¨ 3D μμ
λ°μ΄ν°μ
μ νμ΅λμκ³ , κ° μ€λΈμ νΈμ λ λ§μ λ·°λ₯Ό λ λλ§νκ³ 4K point cloud λμ 16Kλ₯Ό μμ±νλλ‘ νμ²λ¦¬ν©λλ€. Shap-E λͺ¨λΈμ λ λ¨κ³λ‘ νμ΅λ©λλ€:
1. μΈμ½λκ° 3D μμ
μ ν¬μΈνΈ ν΄λΌμ°λμ λ λλ§λ λ·°λ₯Ό λ°μλ€μ΄κ³ μμ
μ λνλ΄λ implicit functionsμ νλΌλ―Έν°λ₯Ό μΆλ ₯ν©λλ€.
2. μΈμ½λκ° μμ±ν latentsλ₯Ό λ°νμΌλ‘ diffusion λͺ¨λΈμ νλ ¨νμ¬ neural radiance fields(NeRF) λλ textured 3D λ©μλ₯Ό μμ±νμ¬ λ€μ΄μ€νΈλ¦Ό μ ν리μΌμ΄μ
μμ 3D μμ
μ λ μ½κ² λ λλ§νκ³ μ¬μ©ν μ μλλ‘ ν©λλ€.
μ΄ κ°μ΄λμμλ Shap-Eλ₯Ό μ¬μ©νμ¬ λλ§μ 3D μμ
μ μμ±νλ λ°©λ²μ 보μ
λλ€!
μμνκΈ° μ μ λ€μ λΌμ΄λΈλ¬λ¦¬κ° μ€μΉλμ΄ μλμ§ νμΈνμΈμ:
```py
# Colabμμ νμν λΌμ΄λΈλ¬λ¦¬λ₯Ό μ€μΉνκΈ° μν΄ μ£Όμμ μ μΈνμΈμ
#!pip install -q diffusers transformers accelerate trimesh
```
## Text-to-3D
3D κ°μ²΄μ gifλ₯Ό μμ±νλ €λ©΄ ν
μ€νΈ ν둬ννΈλ₯Ό [`ShapEPipeline`]μ μ λ¬ν©λλ€. νμ΄νλΌμΈμ 3D κ°μ²΄λ₯Ό μμ±νλ λ° μ¬μ©λλ μ΄λ―Έμ§ νλ μ 리μ€νΈλ₯Ό μμ±ν©λλ€.
```py
import torch
from diffusers import ShapEPipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)
guidance_scale = 15.0
prompt = ["A firecracker", "A birthday cupcake"]
images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
```
μ΄μ [`~utils.export_to_gif`] ν¨μλ₯Ό μ¬μ©νμ¬ μ΄λ―Έμ§ νλ μ 리μ€νΈλ₯Ό 3D κ°μ²΄μ gifλ‘ λ³νν©λλ€.
```py
from diffusers.utils import export_to_gif
export_to_gif(images[0], "firecracker_3d.gif")
export_to_gif(images[1], "cake_3d.gif")
```
<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">prompt = "A firecracker"</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">prompt = "A birthday cupcake"</figcaption>
</div>
</div>
## Image-to-3D
λ€λ₯Έ μ΄λ―Έμ§λ‘λΆν° 3D κ°μ²΄λ₯Ό μμ±νλ €λ©΄ [`ShapEImg2ImgPipeline`]μ μ¬μ©ν©λλ€. κΈ°μ‘΄ μ΄λ―Έμ§λ₯Ό μ¬μ©νκ±°λ μμ ν μλ‘μ΄ μ΄λ―Έμ§λ₯Ό μμ±ν μ μμ΅λλ€. [Kandinsky 2.1](../api/pipelines/kandinsky) λͺ¨λΈμ μ¬μ©νμ¬ μ μ΄λ―Έμ§λ₯Ό μμ±ν΄ λ³΄κ² μ΅λλ€.
```py
from diffusers import DiffusionPipeline
import torch
prior_pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
prompt = "A cheeseburger, white background"
image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()
image = pipeline(
prompt,
image_embeds=image_embeds,
negative_image_embeds=negative_image_embeds,
).images[0]
image.save("burger.png")
```
μΉμ¦λ²κ±°λ₯Ό [`ShapEImg2ImgPipeline`]μ μ λ¬νμ¬ 3D representationμ μμ±ν©λλ€.
```py
from PIL import Image
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif
pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img", torch_dtype=torch.float16, variant="fp16").to("cuda")
guidance_scale = 3.0
image = Image.open("burger.png").resize((256, 256))
images = pipe(
image,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
gif_path = export_to_gif(images[0], "burger_3d.gif")
```
<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">cheeseburger</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">3D cheeseburger</figcaption>
</div>
</div>
## λ©μ μμ±νκΈ°
Shap-Eλ λ€μ΄μ€νΈλ¦Ό μ ν리μΌμ΄μ
μ λ λλ§ν textured λ©μ μΆλ ₯μ μμ±ν μλ μλ μ μ°ν λͺ¨λΈμ
λλ€. μ΄ μμ μμλ π€ Datasets λΌμ΄λΈλ¬λ¦¬μμ [Dataset viewer](https://huggingface.co/docs/hub/datasets-viewer#dataset-preview)λ₯Ό μ¬μ©ν΄ λ©μ μκ°νλ₯Ό μ§μνλ `glb` νμΌλ‘ λ³νν©λλ€.
`output_type` 맀κ°λ³μλ₯Ό `"mesh"`λ‘ μ§μ ν¨μΌλ‘μ¨ [`ShapEPipeline`]κ³Ό [`ShapEImg2ImgPipeline`] λͺ¨λμ λν λ©μ μΆλ ₯μ μμ±ν μ μμ΅λλ€:
```py
import torch
from diffusers import ShapEPipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(device)
guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images
```
λ©μ μΆλ ₯μ `ply` νμΌλ‘ μ μ₯νλ €λ©΄ [`~utils.export_to_ply`] ν¨μλ₯Ό μ¬μ©ν©λλ€:
<Tip>
μ νμ μΌλ‘ [`~utils.export_to_obj`] ν¨μλ₯Ό μ¬μ©νμ¬ λ©μ μΆλ ₯μ `obj` νμΌλ‘ μ μ₯ν μ μμ΅λλ€. λ€μν νμμΌλ‘ λ©μ μΆλ ₯μ μ μ₯ν μ μμ΄ λ€μ΄μ€νΈλ¦Όμμ λμ± μ μ°νκ² μ¬μ©ν μ μμ΅λλ€!
</Tip>
```py
from diffusers.utils import export_to_ply
ply_path = export_to_ply(images[0], "3d_cake.ply")
print(f"Saved to folder: {ply_path}")
```
κ·Έ λ€μ trimesh λΌμ΄λΈλ¬λ¦¬λ₯Ό μ¬μ©νμ¬ `ply` νμΌμ `glb` νμΌλ‘ λ³νν μ μμ΅λλ€:
```py
import trimesh
mesh = trimesh.load("3d_cake.ply")
mesh_export = mesh.export("3d_cake.glb", file_type="glb")
```
κΈ°λ³Έμ μΌλ‘ λ©μ μΆλ ₯μ μλμͺ½ μμ μ μ΄μ μ΄ λ§μΆ°μ Έ μμ§λ§ νμ λ³νμ μ μ©νμ¬ κΈ°λ³Έ μμ μ λ³κ²½ν μ μμ΅λλ€:
```py
import trimesh
import numpy as np
mesh = trimesh.load("3d_cake.ply")
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
mesh = mesh.apply_transform(rot)
mesh_export = mesh.export("3d_cake.glb", file_type="glb")
```
λ©μ νμΌμ λ°μ΄ν°μ
λ ν¬μ§ν 리μ μ
λ‘λν΄ Dataset viewerλ‘ μκ°ννμΈμ!
<div class="flex justify-center">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/3D-cake.gif"/>
</div>
|