Add pipeline tag and correct license

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +215 -1
README.md CHANGED
@@ -1,3 +1,217 @@
1
  ---
2
- license: openrail
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ pipeline_tag: text-to-3d
4
+ library_name: diffusers, threestudio
5
  ---
6
+
7
+ # MVControl
8
+
9
+ [**ArXiv**](https://arxiv.org/abs/2403.09981) | [**Paper**](./assets/paper.pdf) | [**Project Page**](https://lizhiqi49.github.io/MVControl/)
10
+
11
+ Official implementation of *Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting*
12
+
13
+ [Zhiqi Li](https://github.com/lizhiqi49), [Yiming Chen](https://github.com/codejoker-c), [Lingzhe Zhao](https://github.com/LingzheZhao), [Peidong Liu](https://ethliup.github.io/)
14
+
15
+ **Abstract**: *While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.*
16
+
17
+ <p align="center">
18
+ <img src="assets/teaser.jpg">
19
+ </p>
20
+
21
+
22
+ ## Method Overview
23
+ <p align="center">
24
+ <img src="assets/3dpipeline.jpg">
25
+ </p>
26
+
27
+
28
+ ## News
29
+
30
+ - **[2024-05-30]** We have fixed the previous **noisy color issue** of refined SuGaR and textured mesh by using a smaller learning rate for Gaussians' color, the new demo results have been updated in our [**Project Page**](https://lizhiqi49.github.io/MVControl/)!
31
+
32
+
33
+ ## Installation
34
+
35
+ ### Install threestudio
36
+
37
+ **!!! The `requirement.txt` we use is slightly different from the original threestudio repository (the version of diffusers and gradio). If error occurs with the original threestudio env, please use our configuration file.**
38
+
39
+ See [installation.md](docs/installation.md) for additional information, including installation via Docker.
40
+
41
+ The following steps have been tested on Ubuntu20.04.
42
+
43
+ - You must have an NVIDIA graphics card with at least 6GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed.
44
+ - Install `Python >= 3.8`.
45
+ - (Optional, Recommended) Create a virtual environment:
46
+
47
+ ```sh
48
+ python3 -m virtualenv venv
49
+ . venv/bin/activate
50
+
51
+ # Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
52
+ # For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
53
+ python3 -m pip install --upgrade pip
54
+ ```
55
+
56
+ - Install `PyTorch == 2.2.1` since `xformers` requires newest torch version.
57
+
58
+ ```sh
59
+ # newest torch version under cuda11.8
60
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
61
+ ```
62
+
63
+ - (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
64
+
65
+ ```sh
66
+ pip install ninja
67
+ ```
68
+
69
+ - Install dependencies:
70
+
71
+ ```sh
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+ - (Optional) `tiny-cuda-nn` installation might require downgrading pip to 23.0.1
76
+
77
+ - (Optional, Recommended) The best-performing models in threestudio use the newly-released T2I model [DeepFloyd IF](https://github.com/deep-floyd/IF), which currently requires signing a license agreement. If you would like to use these models, you need to [accept the license on the model card of DeepFloyd IF](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0), and login into the Hugging Face hub in the terminal by `huggingface-cli login`.
78
+
79
+ - For contributors, see [here](https://github.com/threestudio-project/threestudio#contributing-to-threestudio).
80
+
81
+ ### Install 3D Gaussian dependencies
82
+
83
+ ```sh
84
+ git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
85
+ git clone https://github.com/DSaurus/simple-knn.git
86
+ pip install ./diff-gaussian-rasterization
87
+ pip install ./simple-knn
88
+ ```
89
+
90
+ ### Install SuGaR dependencies
91
+
92
+ ```sh
93
+ pip install open3d
94
+ # Install pytorch3d
95
+ pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
96
+ ```
97
+
98
+ ### Install LGM dependencies
99
+ ```sh
100
+ pip install -r requirements-lgm.txt
101
+ ```
102
+
103
+ ## Download pre-trained models
104
+
105
+ - For [LGM](https://github.com/3DTopia/LGM), following the instructions in their official repository.
106
+ ```sh
107
+ mkdir pretrained && cd pretrained
108
+ wget https://huggingface.co/ashawkey/LGM/resolve/main/model_fp16.safetensors
109
+ cd ..
110
+ ```
111
+
112
+ - For [MVDream](https://github.com/bytedance/MVDream), we use our [diffusers implementation](https://github.com/lizhiqi49/mvdream-diffusers). The weights will be downloaded automatically via huggingface hub.
113
+
114
+ - Our pre-trained multi-view ControlNets have been uploaded to huggingface hub, and they will also be automatically downloaded.
115
+
116
+ - Or you can also manually download the MVDream and MVControls' checkpoints from [here](https://huggingface.co/lzq49).
117
+
118
+ ## Quickstart
119
+
120
+ ### Stage 1. Generate coarse 3D Gaussians via MVControl + LGM
121
+ The following command will launch a GUI powered by gradio.
122
+ You should fill in `asset_name` box with the name of current experiment, and the results will be saved in directory `workspace/mvcontrol_[condition_type]/[asset_name]`. The input image can be both condition image or a RGB image. When RGB image is input, the option `image need preprocess` on top left of the UI should be tagged, so that condition image, mask, and RGBA images will be saved in the output directory.
123
+ ```sh
124
+ condition_type=depth # canny/depth/normal/scribble
125
+ python app_stage1.py big --resume path/to/LGM/model_fp16.safetensors --condition_type $condition_type
126
+ # The generated coarse Gaussians will be saved to workspace/mvcontrol_{condition_type}/{asset_name}/coarse_gs.ply
127
+ ```
128
+
129
+ ### Stage 2. Gaussian Optimizaiton
130
+ ```sh
131
+ ### Taking 'fatcat' as example
132
+ asset_name=fatcat
133
+ exp_root_dir=workspace/mvcontrol_$condition_type/$asset_name
134
+ hint_path=load/conditions/fatcat_depth.png # path/to/condition.png
135
+ mask_path=load/conditions/fatcat_mask.png # path/to/mask.png
136
+ prompt="A fat cat, standing with hands in ponts pockets" # prompt
137
+ coarse_gs_path=$exp_root_dir/coarse_gs.ply # path/to/saved/coarse_gs.ply
138
+
139
+ python launch.py --config custom/threestudio-3dgs/configs/mvcontrol-gaussian.yaml --train --gpu 0 \
140
+ system.stage=gaussian \
141
+ system.hint_image_path=$hint_path \
142
+ system.hint_mask_path=$mask_path \
143
+ system.control_condition_type=$condition_type \
144
+ system.geometry.geometry_convert_from=$coarse_gs_path \
145
+ system.prompt_processor.prompt='$prompt' \
146
+ system.guidance_control.pretrained_controlnet_name_or_path='lzq49/mvcontrol-4v-${condition_type}' \
147
+ name=$asset_name \
148
+ tag=gaussian_refine
149
+
150
+ # # If use only coarse Gaussians' positions for initialization
151
+ # # Add the following two options in the command
152
+ # system.geometry.load_ply_only_vertex=ture \
153
+ # system.geometry.load_vertex_only_position=true
154
+
155
+
156
+ ### Extract coarse SuGaR from refined Gaussians
157
+ refined_gs_path=$exp_root_dir/gaussian_refine@LAST/save/exported_gs_step3000.ply
158
+ coarse_sugar_output_dir=$exp_root_dir/coarse_sugar
159
+
160
+ python extern/sugar/extract_mesh.py -s extern/sugar/load/scene \
161
+ -c $refined_gs_path -o $coarse_sugar_output_dir --use_vanilla_3dgs
162
+ ```
163
+
164
+ ### Stage 3. SuGaR refinement
165
+ ```sh
166
+ sugar_mesh_path=$coarse_sugar_output_dir/sugarmesh_vanilla3dgs_level0.3_decim200000_pd6.ply
167
+
168
+ python launch.py --config custom/threestudio-3dgs/configs/mvcontrol-sugar-vsd.yaml --train --gpu 0 \
169
+ system.stage=sugar \
170
+ system.hint_image_path=$hint_path \
171
+ system.hint_mask_path=$mask_path \
172
+ system.control_condition_type=$condition_type \
173
+ system.geometry.surface_mesh_to_bind_path=$sugar_mesh_path \
174
+ system.prompt_processor.prompt='$prompt' \
175
+ system.guidance_control.pretrained_controlnet_name_or_path='lzq49/mvcontrol-4v-${condition_type}' \
176
+ name=$asset_name \
177
+ tag=sugar_refine
178
+
179
+ ### Textured mesh extraction
180
+ sugar_out_dir=$exp_root_dir/sugar_refine@LAST
181
+ python launch.py --config $sugar_out_dir/configs/parsed.yaml --export --gpu 0 resume=$sugar_out_dir/ckpts/last.ckpt
182
+ ```
183
+
184
+ ### Easy way
185
+ We also provide a script running stage2 and stage3 from generated coarse Gaussians automatically:
186
+ ```sh
187
+ python run_from_coarse_gs.py -n $asset_name -c $condition_type -p '$prompt' -cp $hint_path -mp $mask_path
188
+ ```
189
+
190
+ ## Tips
191
+ - Our method relies on coarse Gaussian initialization. So in the first stage, it's OK to try different random seeds to get a good LGM output, since the coarse Gaussian generation procedure is very fast (several seconds).
192
+ - For better Gaussian optimization in stage 2, longer optimization steps can be used. We use 3000 steps in our paper for efficiency.
193
+
194
+
195
+ ## Todo
196
+
197
+ - [x] Release the inference code.
198
+ - [ ] Reorgenize the code.
199
+ - [ ] Improve the quality (texture & surface) of SuGaR refinement stage.
200
+ - [ ] Provide more examples for test.
201
+
202
+ ## Credits
203
+ This project is built upon the awesome project [threestudio](https://github.com/threestudio-project) and thanks to the open-source of these works: [LGM](https://github.com/3DTopia/LGM), [MVDream](https://github.com/bytedance/MVDream), [ControlNet](https://github.com/lllyasviel/ControlNet) and [SuGaR](https://github.com/Anttwo/SuGaR).
204
+
205
+
206
+ ## BibTeX
207
+
208
+ ```bibtex
209
+ @misc{li2024controllable,
210
+ title={Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting},
211
+ author={Zhiqi Li and Yiming Chen and Lingzhe Zhao and Peidong Liu},
212
+ year={2024},
213
+ eprint={2403.09981},
214
+ archivePrefix={arXiv},
215
+ primaryClass={cs.CV}
216
+ }
217
+ ```