File size: 5,920 Bytes
7c1a14b
 
 
 
 
 
 
 
8c51416
7c1a14b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
875195b
97eaedb
875195b
 
 
 
 
 
f0212d7
 
ae18280
7c1a14b
 
0e55fa2
ae18280
7c1a14b
85331ff
7c1a14b
 
 
 
 
 
 
 
 
 
6600791
 
 
 
 
 
 
 
 
 
ae18280
 
6600791
 
7c1a14b
 
 
 
 
 
 
 
 
 
f8f2947
6600791
 
7c1a14b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bb7990
7c1a14b
 
 
 
 
 
 
 
 
 
 
 
9bb7990
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c1a14b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
## ___***DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos***___
<div align="center">
<img src='https://depthcrafter.github.io/img/logo.png' style="height:140px"></img>



 <a href='https://arxiv.org/abs/2409.02095'><img src='https://img.shields.io/badge/arXiv-2409.02095-b31b1b.svg'></a> &nbsp;
 <a href='https://depthcrafter.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
 <a href='https://huggingface.co/spaces/tencent/DepthCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> &nbsp;


_**[Wenbo Hu<sup>1* &dagger;</sup>](https://wbhu.github.io), 
[Xiangjun Gao<sup>2*</sup>](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en), 
[Xiaoyu Li<sup>1* &dagger;</sup>](https://xiaoyu258.github.io), 
[Sijie Zhao<sup>1</sup>](https://scholar.google.com/citations?user=tZ3dS3MAAAAJ&hl=en), 
[Xiaodong Cun<sup>1</sup>](https://vinthony.github.io/academic), <br>
[Yong Zhang<sup>1</sup>](https://yzhang2016.github.io), 
[Long Quan<sup>2</sup>](https://home.cse.ust.hk/~quan), 
[Ying Shan<sup>3, 1</sup>](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_
<br><br>
<sup>1</sup>Tencent AI Lab
<sup>2</sup>The Hong Kong University of Science and Technology
<sup>3</sup>ARC Lab, Tencent PCG

arXiv preprint, 2024

</div>

## πŸ”† Introduction

- `[24-10-19]` πŸ€—πŸ€—πŸ€— DepthCrafter now has been integrated into [ComfyUI](https://github.com/akatz-ai/ComfyUI-DepthCrafter-Nodes)!
- `[24-10-08]` πŸ€—πŸ€—πŸ€— DepthCrafter now has been integrated into [Nuke](https://github.com/Theo-SAMINADIN-td/NukeDepthCrafter), have a try!
- `[24-09-28]` Add full dataset inference and evaluation scripts for better comparison use. :-)
- `[24-09-25]` πŸ€—πŸ€—πŸ€— Add huggingface online demo [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter). 
- `[24-09-19]` Add scripts for preparing benchmark datasets. 
- `[24-09-18]` Add point cloud sequence visualization.
- `[24-09-14]` πŸ”₯πŸ”₯πŸ”₯ **DepthCrafter** is released now, have fun!


πŸ”₯ DepthCrafter can generate temporally consistent long-depth sequences with fine-grained details for open-world videos, 
without requiring additional information such as camera poses or optical flow.

πŸ€— If you find DepthCrafter useful, **please help ⭐ this repo**, which is important to Open-Source projects. Thanks!

## πŸŽ₯ Visualization
We provide demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. 
Please refer to our [project page](https://depthcrafter.github.io) for more details.


https://github.com/user-attachments/assets/62141cc8-04d0-458f-9558-fe50bc04cc21




## πŸš€ Quick Start

### πŸ€– Gradio Demo
- Online demo: [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter) 
- Local demo:
    ```bash
    gradio app.py
    ``` 

### 🌟 Community Support
- [NukeDepthCrafter](https://github.com/Theo-SAMINADIN-td/NukeDepthCrafter): 
    a plugin allows you to generate temporally consistent Depth sequences inside Nuke, 
    which is widely used in the VFX industry.
- [ComfyUI-Nodes](https://github.com/akatz-ai/ComfyUI-DepthCrafter-Nodes): creating consistent depth maps for your videos using DepthCrafter in ComfyUI.


### πŸ› οΈ Installation
1. Clone this repo:
```bash
git clone https://github.com/Tencent/DepthCrafter.git
```
2. Install dependencies (please refer to [requirements.txt](requirements.txt)):
```bash
pip install -r requirements.txt
```



### πŸ€— Model Zoo
[DepthCrafter](https://huggingface.co/tencent/DepthCrafter) is available in the Hugging Face Model Hub.

### πŸƒβ€β™‚οΈ Inference
#### 1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:
- Full inference (~0.6 fps on A100, recommended for high-quality results):

    ```bash
    python run.py  --video-path examples/example_01.mp4
    ```


- Fast inference through 4-step denoising and without classifier-free guidance (~2.3 fps on A100οΌ‰:

    ```bash
    python run.py  --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0
    ```


#### 2. Low-resolution inference requires a GPU with ~9GB memory for 512x256 resolution:

- Full inference (~2.3 fps on A100):

    ```bash
    python run.py  --video-path examples/example_01.mp4 --max-res 512
    ```

- Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):
    ```bash
    python run.py  --video-path examples/example_01.mp4  --max-res 512 --num-inference-steps 4 --guidance-scale 1.0
    ```

## πŸš€ Dataset Evaluation
Please check the `benchmark` folder. 
- To create the dataset we use in the paper, you need to run `dataset_extract/dataset_extract_${dataset_name}.py`.
- Then you will get the `csv` files that save the relative root of extracted RGB video and depth npz files. We also provide these csv files.
- Inference for all datasets scripts:
  ```bash
  bash benchmark/infer/infer.sh
  ```
  (Remember to replace the `input_rgb_root` and `saved_root` with your own path.)
- Evaluation for all datasets scripts:
  ```bash
  bash benchmark/eval/eval.sh
  ```
   (Remember to replace the `pred_disp_root` and `gt_disp_root` with your own path.)
####

## 🀝 Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.

## πŸ“œ Citation
If you find this work helpful, please consider citing:
```bibtex
@article{hu2024-DepthCrafter,
            author      = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
            title       = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
            journal     = {arXiv preprint arXiv:2409.02095},
            year        = {2024}
    }
```