📏Metric3D v2

A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Mu Hu*     Wei Yin*     Chi Zhang      Zhipeng Cai      Xiaoxiao Long      Hao Chen
Kaixuan Wang      Gang Yu      Chunhua Shen      Shaojie Shen     
*Equal contribution    

Metric3Dv2 is a versatile zero-shot metric depth and surface normal estimator


We present Metric3Dv2, a versatile geometric foundation model for zero-shot metric depth and surface normal estimation. We address the metric ambiguity fo depth and the label deficiency of normal by (1) Transforming all images and depth labels into a canonical camera space, to avoid confusing networks by the ambiguity from varying camera focal lengths. (2) A joint depth-normal optimization module to distill knowledge from abundant and diverse depth labels to normal learning.

The sliders below compare on in-the-wild images of Metric3Dv2 with the previous state-of-the-art methods ZoeDepth and Omnidata(v2).


Zero-shot monocular metric depth estimation

RGB Ours depth ZoeDepth

Zero-shot monocular surface normal estimation

RGB Ours normal OmniData (v2) normal

How it works

We first transform the input image I to the canonical space c, and further feed the transformed image Ic into a depth-normal estimation model to predict metric depth Dc in the canonical space and surface normal N.

During training, Dc is supervised by a GT depth D*c which is a transformed version in the canonical space. In inference, after producing the metric depth Dc in the canonical space, we perform a de-canonical transformation to convert it back to the space of the original input I.

The predicted normal N is supervised by depth-normal consistency via the recovered metric depth D as well as GT normal D*, if available.

One Minute Video Demo

We present our depth and normal prediction as well as their applications on 3D reconstruction and SLAM here.


