aplux/Midas-v2 · Hugging Face

Midas-v2: Depth Estimation

Midas is a deep learning-based monocular depth estimation model that accurately predicts scene depth from a single RGB image without relying on stereo vision or depth sensors. By integrating a hybrid CNN-Transformer architecture and pretraining on diverse datasets (e.g., MegaDepth, KITTI), it achieves strong cross-scene generalization, adapting to complex lighting, occlusions, and varied environments (indoor/outdoor). The model supports dynamic resolution inputs (down to 256x256 pixels) while preserving detail perception, with optimized computational efficiency for real-time performance and lightweight deployment on mobile/edge devices. It is widely used in autonomous driving (obstacle detection), AR/VR (3D reconstruction), and robotic navigation, significantly reducing hardware costs. Ongoing updates (e.g., Midas-v3) enhance small-object recognition and edge accuracy.

Source model

Input shape: 1x3x256x256
Number of parameters: 20.33M
Model size: 82.17M
Output shape: 1x1x256x256

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Source Model: MIT
Deployable Model: APLUX-MODEL-FARM-LICENSE