Consistent Video Depth Estimation

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Consistent Video Depth Estimation (SIGGRAPH 2020)

Authors: Xuan Luo, Jia-bin Huang, Richard Szeliski, Kevin Matzen, Johannes Kopf
Affiliations: University of Washington, Virginia Tech, Facebook

The main idea behind this paper is that you can self-supervised learning based on 3D geometric constraints to do per-video fine-tuning at test time.
This solves many issues because traditional methods like COLMAP which only rely on 3D constraints cannot estimate a dense depth whereas neural methods relying only on self-supervised training do not generalize well to new videos.
Unlike previous self-supervised methods, they fine-tune a pre-trained depth estimator and incorporate optical flow into their training.

Method

Evaluation