NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

NeRF Representing Scenes as Neural Radiance Fields for View Synthesis (ECCV 2020)

  • Authors: Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
  • Affiliations: UC Berkeley, Google Research, UC San Diego

The main idea is to do volume rendering through an MLP.
As volume rendering is differentiable, the MLP will learn color and density at each position.
Training is supervised on multiple posed static images (e.g. 25 images). Poses can be obtained using COLMAP.

Successful execution relies on two tricks:

  • A positional encoding.
  • Hierarchical volume sampling

Method

Volume Rendering

Positional Encoding

This is further explored in a followup paper[1].

Hierarchical volume sampling

This idea is to uses a course network to influence sampling of the fine network.

Architecture

They use a standard MLP with extra inputs.
Their MLP has 9 layers with 256-dim features and one layer with 128-dim features (~600k parameters?).
See their paper and code for more details.

Experiments

They compare with their past work on Local Light Field Fusion as well as a few other view synthesis papers.
They also publish a dataset available on their Google Drive.

Followup Work

See yenchenlin/awesome-NeRF

Positional Encoding

Videos

References

  1. Tancik, M., Srinivasan, P. P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., ... & Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. Arxiv