📏Metric3D v2

A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Mu Hu*     Wei Yin*     Chi Zhang      Zhipeng Cai      Xiaoxiao Long      Hao Chen
Kaixuan Wang      Gang Yu      Chunhua Shen      Shaojie Shen     
*Equal contribution    

Metric3Dv2 is a versatile zero-shot metric depth and surface normal estimator

Overview

We present Metric3Dv2, a versatile geometric foundation model for zero-shot metric depth and surface normal estimation. We address the metric ambiguity fo depth and the label deficiency of normal by (1) Transforming all images and depth labels into a canonical camera space, to avoid confusing networks by the ambiguity from varying camera focal lengths. (2) A joint depth-normal optimization module to distill knowledge from abundant and diverse depth labels to normal learning.

The sliders below compare on in-the-wild images of Metric3Dv2 with the previous state-of-the-art methods ZoeDepth and Omnidata(v2).

Results

Zero-shot monocular metric depth estimation

RGB Ours depth ZoeDepth

Zero-shot monocular surface normal estimation

RGB Ours normal OmniData (v2) normal


How it works


We first transform the input image I to the canonical space c, and further feed the transformed image Ic into a depth-normal estimation model to predict metric depth Dc in the canonical space and surface normal N.

During training, Dc is supervised by a GT depth D*c which is a transformed version in the canonical space. In inference, after producing the metric depth Dc in the canonical space, we perform a de-canonical transformation to convert it back to the space of the original input I.

The predicted normal N is supervised by depth-normal consistency via the recovered metric depth D as well as GT normal D*, if available.

One Minute Video Demo

We present our depth and normal prediction as well as their applications on 3D reconstruction and SLAM here.

BibTeX

Metric3D V2

@article{hu2024metric3dv2,
  author={Hu, Mu and Yin, Wei, and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao, and Wang, Kaixuan and Yu, Gang and Shen, Chunhua and Shen, Shaojie},
  title={A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation},
  booktitle={arXiv},
  eprint={2404.15506},
  year={2024},
}

Metric3D V1

@inproceedings{yin2023metric3d,
  title={Metric3D: Towards zero-shot metric 3d prediction from a single image},
  author={Yin, Wei and Zhang, Chi and Chen, Hao and Cai, Zhipeng and Yu, Gang and Wang, Kaixuan and Chen, Xiaozhi and Shen, Chunhua},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9043--9053},
  year={2023},
}

References