Deep Blending for Free-Viewpoint Image-Based-Rendering
Deep Blending for Free-Viewpoint Image-Based-Rendering (Siggraph Asia 2018)
Method
Below is the pipeline for their system. Note that they train the CNN separately than their proposed pipeline.
Off-line Scene Processing
In this step, they perform the following:
- Structure from Motion[1] Registration to calibrate the cameras (i.e. get the extrinsics/pose)
- Multiview Stereo Reconstruction (MVS) to generate per-view depth maps and per-view meshes using two methods.
- COLMAP which provides fine details but a sparser reconstruction
- Delauney tetrahedralization (RealityCapture 2016) which provides more completeness and a smoother estimate
- Geometry Refinement
- Mesh Simplification
Off-line CNN Training
The goal of the CNN is to generate a sharp temporally consistent image by blending multiple estimates.
The CNN is trained via hold-out.
On-line Pipeline
- Given a novel viewpoint, create a voxel grid where each voxel contains indices of per-view mesh triangles.
- Generate a global mesh render from the novel viewpoint.
- Use InsideOut to create 4 mosaics, warped input views.
- Input the global mesh render and mosaics into the deep blending CNN.
- Blend the mosaics and the global mesh render.
Architecture
The architecture they use is a U-Net with a fixed set of inputs.
Evaluation
Resources
References
- ↑ Johannes L. Schönberger ; Jan-Michael Frahm, Structure-from-Motion Revisited (CVPR 2016) DOI:10.1109/CVPR.2016.445