Depth Estimation

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Depth Estimation
Goal: Generate an image of depths from one or two images.


Background

Depth vs. Disparity

For stereo methods, people usually estimate pixel disparity rather than depth.
That is, determining how far a pixel moves along the epipolar line between two images.

Usually, this involves first rectifying an images using RANSAC or similar. Then a cost volume can be built. Then argmin is applied to the cost volume to find the best disparity estimate.

Disparity is related to depth by the following formula: \[disparity = baseline * focal / depth\]

  • \(focal\) is the focal length in pixels. This is the distance to your image in pixels.
    This can be calculated as (height/2) * cot(fov_h/2).
    You can't imagine focal length as a depth since if the image is closer then the pixel sizes will be smaller and thus the distance to the image stays the same.
    In the formula, this term acts as a correction factor for the resolution of the disparity.
  • \(baseline\) is the distance between the camera positions. This should be in the same units as your depth.

Stereo Depth

Typically people use cost-volume to estimate depth from a stereo camera setup.


Depth from Motion

Depth is generated in real-time based on motion of the camera

Depth from Focus