Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Project Webpage

Allows training a NeRF or other neural representation in seconds but optimizing features in a hash table.
This is faster than an octree since it doesn't require pointer chasing while using less space than enumerating voxels.
Furthermore, the size of the hash table allows control over the amount of detail.

Method

Multi-resolution Hash Encoding

  1. For each level/resolution
    1. Find which voxel we are in and get indices to the corners.
    2. Hash each corner and retrieve features (\(\displaystyle \in \mathbb{R}^{F}\)) from the hash map.
    3. Do trilienar interpolation to get an average feature.
  2. Concatenate features from all levels (\(\in \mathbb{R}^{LF}\)) along with auxiliary inputs (\(\in \mathbb{R}^{E}\), e.g. view direction) to produce a feature vector \(\mathbf{y} \in \mathbb{R}^{LF+E}\).
  3. Pass \(\displaystyle \mathbf{y}\) through your small feature decoding neural network.

Levels

For resolutions in \(\displaystyle [N_{min}, N_{max}]\).

  • Resolution at level l: \(\displaystyle N_{l} = \lfloor N_{min} * b^{l} \rfloor\)
  • \(\displaystyle b = \exp \left( \frac{\log N_{max} - \log N_{min} }{L-1} \right)\)

Hashing Function

  • Hash collisions are ignored.
  • The hash function used is: \(\displaystyle h(\mathbf{x}) = \left( \oplus_{i=1}^{d} x_i \pi_i \right) \mod T\)
    • \(\oplus\) is bitwise XOR.
    • \(\pi_i\) are unique large primes.
    • \(T\) is the size of the hash table.

Experiments

Setup

  • RTX 3090
  • MLP with two hidden layers with width of 64 neurons.
  • Hash tables are stored using half-precision.

References