Metric3Dv2 is a versatile zero-shot metric depth and surface normal estimator
We present Metric3Dv2, a versatile geometric foundation model for zero-shot metric depth and surface normal estimation. We address the metric ambiguity fo depth and the label deficiency of normal by
(1) Transforming all images and depth labels into a canonical camera space, to avoid confusing networks by the ambiguity from varying camera focal lengths. (2) A joint depth-normal optimization module to distill knowledge from abundant and diverse depth labels to normal learning.
The sliders below compare on in-the-wild images of Metric3Dv2 with the previous state-of-the-art methods ZoeDepth and Omnidata(v2).
We first transform the input image I to the canonical space c, and further feed the transformed image Ic into a depth-normal estimation model to predict metric depth Dc in the canonical space and surface normal N.
During training, Dc is supervised by a GT depth D*c which is a transformed version in the canonical space. In inference, after producing the metric depth Dc in the canonical space, we perform a de-canonical transformation to convert it back to the space of the original input I.
The predicted normal N is supervised by depth-normal consistency via the recovered metric depth D as well as GT normal D*, if available.
We present our depth and normal prediction as well as their applications on 3D reconstruction and SLAM here.
@article{hu2024metric3d,
author={Hu, Mu and Yin, Wei, and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao, and Wang, Kaixuan and Yu, Gang and Shen, Chunhua and Shen, Shaojie},
title={Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation},
booktitle={arXiv},
eprint={2404.15506},
year={2024},
}
@inproceedings{yin2023metric3d,
title={Metric3D: Towards zero-shot metric 3d prediction from a single image},
author={Yin, Wei and Zhang, Chi and Chen, Hao and Cai, Zhipeng and Yu, Gang and Wang, Kaixuan and Chen, Xiaozhi and Shen, Chunhua},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={9043--9053},
year={2023},
}