Digging Into Self-Supervised Monocular Depth Estimation

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Digging Into Self-Supervised Monocular Depth Estimation (ICCV 2019)
Monodepth2

Authors: Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow Affiliations: UCL, Caltech, Niantic

Method

They perform self-supervised training by using the depth for view-synthesis and comparing to other images.

Given a source view \(I_{t'}\) and a target view \(I_t\), let the following:

  • \(T_{t \to t'}\) be the relative pose of \(t'\)
  • \(D_t\) the depth map of view \(t\)
  • \(L_p = \sum_{t'}pe(I, I_{t' \to t}\) the cumulative reprojection error
  • \(I_{t' \to t} = I_{t'}\langle proj(D_t, T_{t \to t'}, K)\rangle \) the projection

From this layout, they make the following contributions:

Per-Pixel Minimum Reprojection Loss

Basically you have two images in a sequence: frame1, frame2, frame3.
Each gives you a loss:

loss1 = abs(frame2 - warp(frame1))
loss2 = abs(frame2 - warp(frame3))
# Take the minimum over all pixels
loss = mean(min(loss1, loss2))

Auto-Masking Stationary Pixels

Multi-scale Estimation

Architecture

Evaluation

Resources

References