Consistent Video Depth Estimation

Consistent Video Depth Estimation (SIGGRAPH 2020)

Authors: Xuan Luo, Jia-bin Huang, Richard Szeliski, Kevin Matzen, Johannes Kopf
Affiliations: University of Washington, Virginia Tech, Facebook

The main idea behind this paper is that you can self-supervised learning based on 3D geometric constraints to do per-video fine-tuning at test time.
This solves many issues because traditional methods like COLMAP which only rely on 3D constraints cannot estimate a dense depth whereas neural methods relying only on self-supervised training do not generalize well to new videos.
Unlike previous self-supervised methods, they fine-tune a pre-trained depth estimator and incorporate optical flow into their training.

Method

Evaluation