Monocular Neural Image Based Rendering with Continuous View Control: Difference between revisions

Line 52: Line 52:


The encoder converts images into latent points.   
The encoder converts images into latent points.   
It consists of 7 convolutional blocks which each downsample the feature map.   
It consists of 8 convolutional blocks which each downsample the feature map. (Note that the paper says 7 but their code uses 8)  
Each block is: Conv-BatchNorm-LeakyReLU.   
Each block is: Conv-BatchNorm-LeakyReLU.   
The output of the convolutional blocks are put through a fully connected layer and reshaped into a \(200 \times 3\) matrix.
The output of the convolutional blocks are put through a fully connected layer and reshaped into a \(200 \times 3\) matrix.


The decoder renders the latent points into a depth map from the target view.   
The decoder renders the latent points into a depth map from the target view.   
It consists of 7 blocks of: Upsample-Conv-BatchNorm-LeakyReLU.   
It consists of 8 blocks of: Upsample-Conv-BatchNorm-LeakyReLU.   
They use bilinear upsampling.
They use bilinear upsampling.
* Optimizer: Adam
** learning_rate=0.00006, beta_1=0.5, beta_2=0.999


==Evaluation==
==Evaluation==