Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

Authors:Vincent Sitzmann, Semon Rezchikov, William T. Freeman, Joshua B. Tenenbaum, Fredo Durand
Affiliations: MIT Links https://arxiv.org/abs/2106.02634

Method

Background

See NeRF and SIREN.

Light Field Networks

The idea here is to use light field rendering instead of volume rendering or SDF ray marching.
In this case, the input to the network is the entire ray rather than the a single 3D point.
Thus, it is not necessary to sample across the entire ray and composite the samples.

Plucker coordinates

They use Plucker coordinates to encode rays instead of directly inputting the (point, direction) representation or using a two-plane parameterization.
The benefit is that Plucker coordinates are invariance to the selected point and can represent the entire 360 set of rays.
\(\displaystyle \mathbf{r} = (\mathbf{d},\mathbf{m}) \in \mathbb{R}^6\) where \(\displaystyle \mathbf{m}=\mathbf{p} \times \mathbf{d}\)

Geometry

(NOT FILLED IN)
There is some interesting discussion in the paper about the point-line isomorphism, epipolar plane image, and how to extract depth.

Metalearning

They use a hypernetwork to convert latent codes to scenes represented by the networks.

Evaluation