\(
\newcommand{\P}[]{\unicode{xB6}}
\newcommand{\AA}[]{\unicode{x212B}}
\newcommand{\empty}[]{\emptyset}
\newcommand{\O}[]{\emptyset}
\newcommand{\Alpha}[]{Α}
\newcommand{\Beta}[]{Β}
\newcommand{\Epsilon}[]{Ε}
\newcommand{\Iota}[]{Ι}
\newcommand{\Kappa}[]{Κ}
\newcommand{\Rho}[]{Ρ}
\newcommand{\Tau}[]{Τ}
\newcommand{\Zeta}[]{Ζ}
\newcommand{\Mu}[]{\unicode{x039C}}
\newcommand{\Chi}[]{Χ}
\newcommand{\Eta}[]{\unicode{x0397}}
\newcommand{\Nu}[]{\unicode{x039D}}
\newcommand{\Omicron}[]{\unicode{x039F}}
\DeclareMathOperator{\sgn}{sgn}
\def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits}
\def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits}
\)
Digging Into Self-Supervised Monocular Depth Estimation (ICCV 2019)
Monodepth2
Authors: Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow
Affiliations: UCL, Caltech, Niantic
Method
They perform self-supervised training by using the depth for view-synthesis and comparing to other images.
Given a source view \(I_{t'}\) and a target view \(I_t\), let the following:
- \(T_{t \to t'}\) be the relative pose of \(t'\)
- \(D_t\) the depth map of view \(t\)
- \(L_p = \sum_{t'}pe(I, I_{t' \to t}\) the cumulative reprojection error
- \(I_{t' \to t} = I_{t'}\langle proj(D_t, T_{t \to t'}, K)\rangle \) the projection
From this layout, they make the following contributions:
Per-Pixel Minimum Reprojection Loss
Auto-Masking Stationary Pixels
Multi-scale Estimation
Architecture
Evaluation
Resources
References