Depth Anything V2 for Metric Depth Estimation

Pre-trained Models

We provide six metric depth models of three scales for indoor and outdoor scenes, respectively.

Base Model Params Indoor (Hypersim) Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small 24.8M Download Download
Depth-Anything-V2-Base 97.5M Download Download
Depth-Anything-V2-Large 335.3M Download Download

We recommend to first try our larger models (if computational cost is affordable) and the indoor version.

Usage

Prepraration

git clone https://github.com/DepthAnything/Depth-Anything-V2
cd Depth-Anything-V2/metric_depth
pip install -r requirements.txt

Download the checkpoints listed here and put them under the checkpoints directory.

Use our models

import cv2
import torch

from depth_anything_v2.dpt import DepthAnythingV2

model_configs = {
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]}
}

encoder = 'vitl' # or 'vits', 'vitb'
dataset = 'hypersim' # 'hypersim' for indoor model, 'vkitti' for outdoor model
max_depth = 20 # 20 for indoor model, 80 for outdoor model

model = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})
model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth', map_location='cpu'))
model.eval()

raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW depth map in meters in numpy

Running script on images

Here, we take the vitl encoder as an example. You can also use vitb or vits encoders.

# indoor scenes
python run.py \
  --encoder vitl \
  --load-from checkpoints/depth_anything_v2_metric_hypersim_vitl.pth \
  --max-depth 20 \
  --img-path <path> --outdir <outdir> [--input-size <size>] [--save-numpy]

# outdoor scenes
python run.py \
  --encoder vitl \
  --load-from checkpoints/depth_anything_v2_metric_vkitti_vitl.pth \
  --max-depth 80 \
  --img-path <path> --outdir <outdir> [--input-size <size>] [--save-numpy]

Project 2D images to point clouds:

python depth_to_pointcloud.py \
  --encoder vitl \
  --load-from checkpoints/depth_anything_v2_metric_hypersim_vitl.pth \
  --max-depth 20 \
  --img-path <path> --outdir <outdir>

Reproduce training

Please first prepare the Hypersim and Virtual KITTI 2 datasets. Then:

bash dist_train.sh

Citation

If you find this project useful, please consider citing:

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.