Neural Fields: Difference between revisions
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| Neural Fields refers to using neural networks or neural methods to   | Neural Fields refers to using neural networks or neural methods to represent scenes or other signals in computer vision and graphics. | ||
| ==Techniques== | ==Techniques== | ||
| Line 26: | Line 25: | ||
| ;MLP | ;MLP | ||
| * SIREN | |||
| ** Proposes using sine activation to remove the spectrial bias instead of positional encoding. | |||
| * [https://arxiv.org/pdf/2104.09125.pdf SAPE] | |||
| ** Progressively exposes additional frequencies during training based on time and space. | |||
| ;CNN + MLP | ;CNN + MLP | ||
| Line 40: | Line 43: | ||
| * Neural Sparse Voxel Fields | * Neural Sparse Voxel Fields | ||
| * KiloNeRF | * KiloNeRF | ||
| ** Uses thousands of small voxels, each modelled by a single NeRF. Optimized using a teacher network. | |||
| ;Point Clouds | ;Point Clouds | ||
| Line 46: | Line 50: | ||
| ====Feature Grids==== | ====Feature Grids==== | ||
| * Plenoctrees | |||
| ** Convert a NeRF into a octree of spherical harmonics for fast rendering. | |||
| * Plenoxels | |||
| ** Directly use a voxel grid of spherical harmonics to fast optimization and rendering. | |||
| https://nv-tlabs.github.io/vqad/ | * Hash (Instant-NGP) | ||
| ** Use a hash function map voxels to features in a codebook. Disconnects grid resolution from codebook size. | |||
| * [https://nv-tlabs.github.io/vqad/ Variable Bitrate Neural Fields] | |||
| ** Use vector quantization to compress feature grids. However, need to store an grid of indices. | |||
| ;Factorized Feature Grids | ;Factorized Feature Grids | ||
| * TensoRF | * TensoRF | ||
| ===Dynamic Content=== | |||
| ====Deformation==== | |||
| The idea here is to have an MLP which models the deformation of a canonical frame to the target frame. | |||
| * [https://nerfies.github.io/ Nerfies: Deformable Neural Radiance Fields] | |||
| ** Windowed positional encoding | |||
| * [https://www.albertpumarola.com/research/D-NeRF/index.html D-NeRF (CVPR 2020)] | |||
| * [https://hypernerf.github.io/ HyperNeRF (SIGGRAPH Asia 2021)] | |||
| ** Allows the deformation network to output 2 additional feature values which ''slice'' the canonical NeRF. | |||
| ====Latent code==== | |||
| * [https://neural-3d-video.github.io/ Neural 3D Video Synthesis (CVPR 2022)] | |||
| ** Use a latent code for each time step. | |||
| * [https://arxiv.org/abs/2210.15947 NeRFPlayer] | |||
| ** Use a time-dependent sliding window along the feature channels in a feature grid. | |||
| ====Time-axis==== | |||
| * [https://video-nerf.github.io/ Space-time Neural Irradiance Fields for Free-Viewpoint Video (Video-NeRF)] | |||
| ** Adds a bunch of regularization which allows directly inputting time to the MLP. | |||
| * [https://aoliao12138.github.io/FPO/ Fourier PlenOctrees] | |||
| ** Apply DFT to spherical harmonics in PlenOctrees. | |||
| * [https://arxiv.org/pdf/2202.06088.pdf NeuVV] | |||
| ** Hyperspherical Harmonics | |||
| * [https://arxiv.org/abs/2301.11280 Text-To-4D Dynamic Scene Generation (2023)] | |||
| ** Extends the tri-plane feature grid to a six-plane feature grid ({x, y, z, t} choose 2). | |||
| ====Segmentation==== | |||
| Segment static background and objects from dynamic background and objects | |||
| * [https://arxiv.org/pdf/2104.14786.pdf Editable Free-Viewpoint Video using a Layered Neural Representation] | |||
| ** Create a scene by compositing one NeRF per actor. | |||
| * [https://arxiv.org/abs/2303.03361 NeRFlets (2023)] | |||
| * [https://arxiv.org/abs/2303.14536 SUDS: Scalable Urban Dynamic Scenes (2023)] | |||
| ===Generalization=== | ===Generalization=== | ||
| Generalization mainly focuses on learning a prior over the distribution, similar to what existing image generation network do.<br> | Generalization mainly focuses on learning a prior over the distribution, similar to what existing image generation network do.<br> | ||
| This enables tasks such as view synthesis from a single image, shape completion,  | This enables more advanced vision tasks such as view synthesis from a single image, shape completion, inpainting, object generation, segmentation. | ||
| ;CNN | ;CNN | ||
Latest revision as of 15:30, 30 March 2023
Neural Fields refers to using neural networks or neural methods to represent scenes or other signals in computer vision and graphics.
Techniques
Forward Maps
Forward maps are the differentiable functions which convert the representation to an observed signal.
Shapes
- Occupancy Grids or Voxel Grids
- Signed Distance Functions
- Primary-ray (PRIF)
3D Scenes
- Radiance Fields (NeRF)
- Light Fields
Identity
- Images
Architectures
Neural Networks
- MLP
- SIREN
- Proposes using sine activation to remove the spectrial bias instead of positional encoding.
 
- SAPE
- Progressively exposes additional frequencies during training based on time and space.
 
- CNN + MLP
- Progressive Architectures
Hybrid Representations
- Voxel Grids
These typically combine a octree or voxel grid with an MLP.
Some of these are basically feature grids.
- Neural Sparse Voxel Fields
- KiloNeRF
- Uses thousands of small voxels, each modelled by a single NeRF. Optimized using a teacher network.
 
- Point Clouds
- Mesh
Feature Grids
- Plenoctrees
- Convert a NeRF into a octree of spherical harmonics for fast rendering.
 
- Plenoxels
- Directly use a voxel grid of spherical harmonics to fast optimization and rendering.
 
- Hash (Instant-NGP)
- Use a hash function map voxels to features in a codebook. Disconnects grid resolution from codebook size.
 
- Variable Bitrate Neural Fields
- Use vector quantization to compress feature grids. However, need to store an grid of indices.
 
- Factorized Feature Grids
- TensoRF
Dynamic Content
Deformation
The idea here is to have an MLP which models the deformation of a canonical frame to the target frame.
- Nerfies: Deformable Neural Radiance Fields
- Windowed positional encoding
 
- D-NeRF (CVPR 2020)
- HyperNeRF (SIGGRAPH Asia 2021)
- Allows the deformation network to output 2 additional feature values which slice the canonical NeRF.
 
Latent code
- Neural 3D Video Synthesis (CVPR 2022)
- Use a latent code for each time step.
 
- NeRFPlayer
- Use a time-dependent sliding window along the feature channels in a feature grid.
 
Time-axis
- Space-time Neural Irradiance Fields for Free-Viewpoint Video (Video-NeRF)
- Adds a bunch of regularization which allows directly inputting time to the MLP.
 
- Fourier PlenOctrees
- Apply DFT to spherical harmonics in PlenOctrees.
 
- NeuVV
- Hyperspherical Harmonics
 
- Text-To-4D Dynamic Scene Generation (2023)
- Extends the tri-plane feature grid to a six-plane feature grid ({x, y, z, t} choose 2).
 
Segmentation
Segment static background and objects from dynamic background and objects
- Editable Free-Viewpoint Video using a Layered Neural Representation
- Create a scene by compositing one NeRF per actor.
 
- NeRFlets (2023)
- SUDS: Scalable Urban Dynamic Scenes (2023)
Generalization
Generalization mainly focuses on learning a prior over the distribution, similar to what existing image generation network do.
This enables more advanced vision tasks such as view synthesis from a single image, shape completion, inpainting, object generation, segmentation.
- CNN
- pixelNeRF
- Latent Codes
- Hyper Networks
- Light Field Networks
Applications
3D Generation
- EG3D - Adapting Stylegan2, NeRF, and a super-resolution network for generating 3D scenes
- Dream Fields - CLIP-guided NeRF generation
- Dreamfusion - Adapting text-to-image diffusion models to generate NeRFs
