5,350
edits
Line 53: | Line 53: | ||
The encoder converts images into latent points. | The encoder converts images into latent points. | ||
It consists of 8 convolutional blocks which each downsample the feature map. (Note that the supplementary material says 7 but their code actually uses 8). | It consists of 8 convolutional blocks which each downsample the feature map. (Note that the supplementary material says 7 but their code actually uses 8). | ||
Each block is: | Each block is: conv-BatchNorm-LeakyReLU. | ||
Each convolutional layer uses a 4x4 kernel with stride 2 and padding 1 which haves the resolution ((x−4+2)/2+1)=x/2. | |||
The final output of the convolution blocks has size \((1, 1, 2**8)\). | The final output of the convolution blocks has size \((1, 1, 2**8)\). | ||
The output of the convolutional blocks are put through a fully connected layer and reshaped into a \(200 \times 3\) matrix. | The output of the convolutional blocks are put through a fully connected layer and reshaped into a \(200 \times 3\) matrix. | ||
The decoder renders the latent points into a depth map from the target view. | The decoder renders the latent points into a depth map from the target view. | ||
It consists of 8 blocks of: Upsample-Conv-BatchNorm-LeakyReLU. | It consists of 8 blocks of: Upsample-ReflectionPad-Conv-BatchNorm-LeakyReLU. | ||
The upsample layer doubles the width and height using bilinear interpolation. | |||
* Optimizer: Adam | * Optimizer: Adam |