Spaces:
Running
on
Zero
Running
on
Zero
File size: 5,920 Bytes
7c1a14b 8c51416 7c1a14b 875195b 97eaedb 875195b f0212d7 ae18280 7c1a14b 0e55fa2 ae18280 7c1a14b 85331ff 7c1a14b 6600791 ae18280 6600791 7c1a14b f8f2947 6600791 7c1a14b 9bb7990 7c1a14b 9bb7990 7c1a14b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
## ___***DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos***___
<div align="center">
<img src='https://depthcrafter.github.io/img/logo.png' style="height:140px"></img>
<a href='https://arxiv.org/abs/2409.02095'><img src='https://img.shields.io/badge/arXiv-2409.02095-b31b1b.svg'></a>
<a href='https://depthcrafter.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a href='https://huggingface.co/spaces/tencent/DepthCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a>
_**[Wenbo Hu<sup>1* †</sup>](https://wbhu.github.io),
[Xiangjun Gao<sup>2*</sup>](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en),
[Xiaoyu Li<sup>1* †</sup>](https://xiaoyu258.github.io),
[Sijie Zhao<sup>1</sup>](https://scholar.google.com/citations?user=tZ3dS3MAAAAJ&hl=en),
[Xiaodong Cun<sup>1</sup>](https://vinthony.github.io/academic), <br>
[Yong Zhang<sup>1</sup>](https://yzhang2016.github.io),
[Long Quan<sup>2</sup>](https://home.cse.ust.hk/~quan),
[Ying Shan<sup>3, 1</sup>](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_
<br><br>
<sup>1</sup>Tencent AI Lab
<sup>2</sup>The Hong Kong University of Science and Technology
<sup>3</sup>ARC Lab, Tencent PCG
arXiv preprint, 2024
</div>
## π Introduction
- `[24-10-19]` π€π€π€ DepthCrafter now has been integrated into [ComfyUI](https://github.com/akatz-ai/ComfyUI-DepthCrafter-Nodes)!
- `[24-10-08]` π€π€π€ DepthCrafter now has been integrated into [Nuke](https://github.com/Theo-SAMINADIN-td/NukeDepthCrafter), have a try!
- `[24-09-28]` Add full dataset inference and evaluation scripts for better comparison use. :-)
- `[24-09-25]` π€π€π€ Add huggingface online demo [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter).
- `[24-09-19]` Add scripts for preparing benchmark datasets.
- `[24-09-18]` Add point cloud sequence visualization.
- `[24-09-14]` π₯π₯π₯ **DepthCrafter** is released now, have fun!
π₯ DepthCrafter can generate temporally consistent long-depth sequences with fine-grained details for open-world videos,
without requiring additional information such as camera poses or optical flow.
π€ If you find DepthCrafter useful, **please help β this repo**, which is important to Open-Source projects. Thanks!
## π₯ Visualization
We provide demos of unprojected point cloud sequences, with reference RGB and estimated depth videos.
Please refer to our [project page](https://depthcrafter.github.io) for more details.
https://github.com/user-attachments/assets/62141cc8-04d0-458f-9558-fe50bc04cc21
## π Quick Start
### π€ Gradio Demo
- Online demo: [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter)
- Local demo:
```bash
gradio app.py
```
### π Community Support
- [NukeDepthCrafter](https://github.com/Theo-SAMINADIN-td/NukeDepthCrafter):
a plugin allows you to generate temporally consistent Depth sequences inside Nuke,
which is widely used in the VFX industry.
- [ComfyUI-Nodes](https://github.com/akatz-ai/ComfyUI-DepthCrafter-Nodes): creating consistent depth maps for your videos using DepthCrafter in ComfyUI.
### π οΈ Installation
1. Clone this repo:
```bash
git clone https://github.com/Tencent/DepthCrafter.git
```
2. Install dependencies (please refer to [requirements.txt](requirements.txt)):
```bash
pip install -r requirements.txt
```
### π€ Model Zoo
[DepthCrafter](https://huggingface.co/tencent/DepthCrafter) is available in the Hugging Face Model Hub.
### πββοΈ Inference
#### 1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:
- Full inference (~0.6 fps on A100, recommended for high-quality results):
```bash
python run.py --video-path examples/example_01.mp4
```
- Fast inference through 4-step denoising and without classifier-free guidance οΌ~2.3 fps on A100οΌ:
```bash
python run.py --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0
```
#### 2. Low-resolution inference requires a GPU with ~9GB memory for 512x256 resolution:
- Full inference (~2.3 fps on A100):
```bash
python run.py --video-path examples/example_01.mp4 --max-res 512
```
- Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):
```bash
python run.py --video-path examples/example_01.mp4 --max-res 512 --num-inference-steps 4 --guidance-scale 1.0
```
## π Dataset Evaluation
Please check the `benchmark` folder.
- To create the dataset we use in the paper, you need to run `dataset_extract/dataset_extract_${dataset_name}.py`.
- Then you will get the `csv` files that save the relative root of extracted RGB video and depth npz files. We also provide these csv files.
- Inference for all datasets scripts:
```bash
bash benchmark/infer/infer.sh
```
(Remember to replace the `input_rgb_root` and `saved_root` with your own path.)
- Evaluation for all datasets scripts:
```bash
bash benchmark/eval/eval.sh
```
(Remember to replace the `pred_disp_root` and `gt_disp_root` with your own path.)
####
## π€ Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.
## π Citation
If you find this work helpful, please consider citing:
```bibtex
@article{hu2024-DepthCrafter,
author = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
title = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
journal = {arXiv preprint arXiv:2409.02095},
year = {2024}
}
```
|