Neural RGB→D Sensing: Depth and Uncertainty from a Video Camera

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Neural RGB→D Sensing: Depth and Uncertainty from a Video Camera (CVPR 2019)

Authors: Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa G. Narasimhan, Jan Kautz
Affiliations: NVIDIA, Carnegie Mellon University, SenseTime

The main ideas here are:

  • Estimate a "depth probability distribution" rather than a single value
    • For each image, we get a "Depth Probability Volume (DPV)" representing a depth MLE and an uncertainty measure.
  • Accumulate DPV estimates across time or across frames.

Method

  • They first estimate the DPV using a network called the D-Net.
  • Next they calculate the difference between the current predicted DPV and the previous DPV.
    This residual is passed through the K-Net to create an update.
    The update is added to the previous DPV to create the updated DPV.
  • An R-Net refines the updated DPV using input image features from the D-Net.

Architecture

See their supplementary material for details.

D-Net

The D-Net consists of 28 convolutional blocks followed by 4 branches of spatial pyramid layers and a fusion layer.

K-Net

R-Net

Evaluation

References