360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume: Difference between revisions

From David's Wiki
No edit summary
Line 9: Line 9:


==Architecture==
==Architecture==
[[File:360SD-Net fig 2.png|600px|thumb|Architecture Figure from their paper]]


Their input is two equirectangular images, one taken above another. 
They also input the polar angle:
<pre>
# Y angle
angle_y = np.array([(i-0.5)/512*180 for i in range(256, -256, -1)])
angle_ys = np.tile(angle_y[:, np.newaxis, np.newaxis], (1,1024, 1))
equi_info = angle_ys
</pre>
The angles are equivalent to <code>np.linspace(90, -90, height+1)[:-1] - 0.5*(180/height)</code> and broadcast into a \([H, W, 1]\) image tensor.
===Feature Extraction===
They first feed both input images individually into a CNN with one Resblock. 
They also feed the polar angle image into a separate CNN. 
Then they concatenate the features for the polar angle images into the features for each input image separately.
;My Questions
* What is their Resblock? Is it just a conv+batchnorm+relu with a residual connection?
* The polar angle image is the same for all images.
** How is this different from just concatenating a random variable and optimizing that variable?


==Feature Extraction==
Both top and bottom images are p


===ASPP Module===
===ASPP Module===
Line 21: Line 39:
The idea here is to perform convolution over multiple scale of the input image or feature tensor.   
The idea here is to perform convolution over multiple scale of the input image or feature tensor.   
This is performed using multiple parallel convolutions of the input, each with different dilation sizes.
This is performed using multiple parallel convolutions of the input, each with different dilation sizes.
<pre>
#... make model
self.aspp1 = nn.Sequential(convbn(160, 32, 1, 1, 0, 1), nn.ReLU(inplace=True))
self.aspp2 = nn.Sequential(convbn(160, 32, 3, 1, 1, 6), nn.ReLU(inplace=True))
self.aspp3 = nn.Sequential(convbn(160, 32, 3, 1, 1, 12), nn.ReLU(inplace=True))
self.aspp4 = nn.Sequential(convbn(160, 32, 3, 1, 1, 18), nn.ReLU(inplace=True))
self.aspp5 = nn.Sequential(convbn(160, 32, 3, 1, 1, 24), nn.ReLU(inplace=True))
</pre>
<pre>
#... call
ASPP1 = self.aspp1(output_skip_c)
ASPP2 = self.aspp2(output_skip_c)
ASPP3 = self.aspp3(output_skip_c)
ASPP4 = self.aspp4(output_skip_c)
ASPP5 = self.aspp5(output_skip_c)
output_feature = torch.cat((output_raw, ASPP1,ASPP2,ASPP3,ASPP4,ASPP5), 1)
</pre>


==Learnable Cost Volume==
==Learnable Cost Volume==

Revision as of 19:39, 15 June 2020

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume


Method

Architecture

Architecture Figure from their paper

Their input is two equirectangular images, one taken above another.
They also input the polar angle:

# Y angle
angle_y = np.array([(i-0.5)/512*180 for i in range(256, -256, -1)])
angle_ys = np.tile(angle_y[:, np.newaxis, np.newaxis], (1,1024, 1))
equi_info = angle_ys

The angles are equivalent to np.linspace(90, -90, height+1)[:-1] - 0.5*(180/height) and broadcast into a \([H, W, 1]\) image tensor.

Feature Extraction

They first feed both input images individually into a CNN with one Resblock.
They also feed the polar angle image into a separate CNN.
Then they concatenate the features for the polar angle images into the features for each input image separately.

My Questions
  • What is their Resblock? Is it just a conv+batchnorm+relu with a residual connection?
  • The polar angle image is the same for all images.
    • How is this different from just concatenating a random variable and optimizing that variable?


ASPP Module

Atrous-Spatial Pyramid Pooling

This idea comes from Chen et al.[1].

The idea here is to perform convolution over multiple scale of the input image or feature tensor.
This is performed using multiple parallel convolutions of the input, each with different dilation sizes.

#... make model
self.aspp1 = nn.Sequential(convbn(160, 32, 1, 1, 0, 1), nn.ReLU(inplace=True))
self.aspp2 = nn.Sequential(convbn(160, 32, 3, 1, 1, 6), nn.ReLU(inplace=True))
self.aspp3 = nn.Sequential(convbn(160, 32, 3, 1, 1, 12), nn.ReLU(inplace=True))
self.aspp4 = nn.Sequential(convbn(160, 32, 3, 1, 1, 18), nn.ReLU(inplace=True))
self.aspp5 = nn.Sequential(convbn(160, 32, 3, 1, 1, 24), nn.ReLU(inplace=True))
#... call
ASPP1 = self.aspp1(output_skip_c)
ASPP2 = self.aspp2(output_skip_c)
ASPP3 = self.aspp3(output_skip_c)
ASPP4 = self.aspp4(output_skip_c)
ASPP5 = self.aspp5(output_skip_c)
output_feature = torch.cat((output_raw, ASPP1,ASPP2,ASPP3,ASPP4,ASPP5), 1)

Learnable Cost Volume

Dataset

They construct a dataset using Matterport3D and Stanford 3D datasets. Their constructed dataset is available upon request.

Evaluation

  1. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs https://arxiv.org/abs/1412.7062