Monocular Neural Image Based Rendering with Continuous View Control: Difference between revisions

Monocular Neural Image Based Rendering with Continuous View Control (view source)

181 bytes added , 14 June 2020

5,337

edits

@@ Line 53: / Line 53: @@
 The encoder converts images into latent points.
 It consists of 8 convolutional blocks which each downsample the feature map. (Note that the supplementary material says 7 but their code actually uses 8).
-Each block is: Conv-BatchNorm-LeakyReLU.
+Each block is: conv-BatchNorm-LeakyReLU.
+Each convolutional layer uses a 4x4 kernel with stride 2 and padding 1 which haves the resolution ((x−4+2)/2+1)=x/2.
 The final output of the convolution blocks has size \((1, 1, 2**8)\).
 The output of the convolutional blocks are put through a fully connected layer and reshaped into a \(200 \times 3\) matrix.
 The decoder renders the latent points into a depth map from the target view.
-It consists of 8 blocks of: Upsample-Conv-BatchNorm-LeakyReLU.
+It consists of 8 blocks of: Upsample-ReflectionPad-Conv-BatchNorm-LeakyReLU.
-They use bilinear upsampling.
+The upsample layer doubles the width and height using bilinear interpolation.
 * Optimizer: Adam