Deep Blending for Free-Viewpoint Image-Based-Rendering

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Deep Blending for Free-Viewpoint Image-Based-Rendering (Siggraph Asia 2018)

Method

Below is the pipeline for their system. Note that they train the CNN separately than their proposed pipeline.

Off-line Scene Processing

In this step, they perform the following:

  • Structure from Motion[1] Registration to calibrate the cameras (i.e. get the extrinsics/pose)
  • Multiview Stereo Reconstruction (MVS) to generate per-view depth maps and per-view meshes using two methods.
    • COLMAP which provides fine details but a sparser reconstruction
    • Delauney tetrahedralization (RealityCapture 2016) which provides more completeness and a smoother estimate
  • Geometry Refinement
  • Mesh Simplification

Off-line CNN Training

The goal of the CNN is to generate a sharp temporally consistent image by blending multiple estimates.

The CNN is trained via hold-out.

On-line Pipeline

  • Given a novel viewpoint, create a voxel grid where each voxel contains indices of per-view mesh triangles.
  • Generate a global mesh render from the novel viewpoint.
  • Use InsideOut to create 4 mosaics, warped input views.
  • Input the global mesh render and mosaics into the deep blending CNN.
  • Blend the mosaics and the global mesh render.

Architecture

The architecture they use is a U-Net with a fixed set of inputs.

Evaluation

Resources

References

  1. Johannes L. Schönberger ; Jan-Michael Frahm, Structure-from-Motion Revisited (CVPR 2016) DOI:10.1109/CVPR.2016.445