Instant Neural Graphics Primitives with a Multiresolution Hash Encoding: Difference between revisions

Latest revision as of 17:55, 31 January 2022

Allows training a NeRF or other neural representation in seconds but optimizing features in a hash table.
This is faster than an octree since it doesn't require pointer chasing while using less space than enumerating voxels.
Furthermore, the size of the hash table allows control over the amount of detail.

Method

Multi-resolution Hash Encoding

For each level/resolution
1. Find which voxel we are in and get indices to the corners.
2. Hash each corner and retrieve features (\(\displaystyle \in \mathbb{R}^{F}\)) from the hash map.
3. Do trilienar interpolation to get an average feature.
Concatenate features from all levels (\(\in \mathbb{R}^{LF}\)) along with auxiliary inputs (\(\in \mathbb{R}^{E}\), e.g. view direction) to produce a feature vector \(\mathbf{y} \in \mathbb{R}^{LF+E}\).
Pass \(\displaystyle \mathbf{y}\) through your small feature decoding neural network.

Levels

For resolutions in \(\displaystyle [N_{min}, N_{max}]\).

Resolution at level l: \(\displaystyle N_{l} = \lfloor N_{min} * b^{l} \rfloor\)
\(\displaystyle b = \exp \left( \frac{\log N_{max} - \log N_{min} }{L-1} \right)\)

Hashing Function

Hash collisions are ignored.
The hash function used is: \(\displaystyle h(\mathbf{x}) = \left( \oplus_{i=1}^{d} x_i \pi_i \right) \mod T\)
- \(\oplus\) is bitwise XOR.
- \(\pi_i\) are unique large primes.
- \(T\) is the size of the hash table.

Experiments

Setup

RTX 3090
MLP with two hidden layers with width of 64 neurons.
Hash tables are stored using half-precision.

References

@@ Line 8: / Line 8: @@
 ==Method==
 ===Multi-resolution Hash Encoding===
+# For each level/resolution
+## Find which voxel we are in and get indices to the corners.
+## Hash each corner and retrieve features (<math>\in \mathbb{R}^{F}</math>) from the hash map.
+## Do trilienar interpolation to get an average feature.
+# Concatenate features from all levels (<math display="inline">\in \mathbb{R}^{LF}</math>) along with auxiliary inputs (<math display="inline">\in \mathbb{R}^{E}</math>, e.g. view direction) to produce a feature vector <math display="inline">\mathbf{y} \in \mathbb{R}^{LF+E}</math>.
+# Pass <math>\mathbf{y}</math> through your small feature decoding neural network.
+===Levels===
+For resolutions in <math>[N_{min}, N_{max}]</math>.
+* Resolution at level l: <math>N_{l} = \lfloor N_{min} * b^{l} \rfloor</math>
+* <math>b = \exp \left( \frac{\log N_{max} - \log N_{min} }{L-1} \right)</math>
+===Hashing Function===
+* Hash collisions are ignored.
+* The hash function used is: <math>h(\mathbf{x}) = \left( \oplus_{i=1}^{d} x_i \pi_i \right) \mod T</math>
+** <math display="inline">\oplus</math> is bitwise XOR.
+** <math display="inline">\pi_i</math> are unique large primes.
+** <math display="inline">T</math> is the size of the hash table.
 ==Experiments==
+===Setup===
+* RTX 3090
+* MLP with two hidden layers with width of 64 neurons.
+* Hash tables are stored using half-precision.
 ==References==
 [[Category: Papers]]