Metric3D-v2

A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

^*Equal contribution

Metric3Dv2 is a versatile zero-shot metric depth and surface normal estimator

Overview

We present Metric3Dv2, a versatile geometric foundation model for zero-shot metric depth and surface normal estimation. We address the metric ambiguity fo depth and the label deficiency of normal by (1) Transforming all images and depth labels into a canonical camera space, to avoid confusing networks by the ambiguity from varying camera focal lengths. (2) A joint depth-normal optimization module to distill knowledge from abundant and diverse depth labels to normal learning.

The sliders below compare on in-the-wild images of Metric3Dv2 with the previous state-of-the-art methods ZoeDepth and Omnidata(v2).

Results

Zero-shot monocular metric depth estimation

Zero-shot monocular surface normal estimation

How it works

We first transform the input image I to the canonical space c, and further feed the transformed image I_c into a depth-normal estimation model to predict metric depth D_c in the canonical space and surface normal N.

During training, D_c is supervised by a GT depth D^*_c which is a transformed version in the canonical space. In inference, after producing the metric depth D_c in the canonical space, we perform a de-canonical transformation to convert it back to the space of the original input I.

The predicted normal N is supervised by depth-normal consistency via the recovered metric depth D as well as GT normal D^*, if available.

BibTeX

Metric3D V2

@article{hu2024metric3d, author={Hu, Mu and Yin, Wei, and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao, and Wang, Kaixuan and Yu, Gang and Shen, Chunhua and Shen, Shaojie}, title={Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation}, booktitle={arXiv}, eprint={2404.15506}, year={2024}, }

Metric3D V1

@inproceedings{yin2023metric3d, title={Metric3D: Towards zero-shot metric 3d prediction from a single image}, author={Yin, Wei and Zhang, Chi and Chen, Hao and Cai, Zhipeng and Yu, Gang and Wang, Kaixuan and Chen, Xiaozhi and Shen, Chunhua}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={9043--9053}, year={2023}, }

📏Metric3D v2