360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume: Difference between revisions

Line 93: Line 93:
Each slice \(i\) is computed by taking images \(I_1\) and \(I_2\), sliding image \(I_2\) down by \(i\) pixels, and subtracting them to yield  
Each slice \(i\) is computed by taking images \(I_1\) and \(I_2\), sliding image \(I_2\) down by \(i\) pixels, and subtracting them to yield  
<pre>
<pre>
# This is not their actual code. Their actual code is slightly more complicated.
cost_volume[:,:,i] = I_1 - stack(tile(I_2[:1,:], [i,1]), I_2[i:,:], axis=0)
cost_volume[:,:,i] = I_1 - stack(tile(I_2[:1,:], [i,1]), I_2[i:,:], axis=0)
</pre>
</pre>


They learn the optimal step sizes by applying a 7x1 Conv2D CNN.
They learn the optimal step sizes by applying a 7x1 Conv2D CNN.
{{ hidden | Their cost volume code |
<syntaxhighlight lang=python>
cost = Variable(torch.FloatTensor(refimg_fea.size()[0], refimg_fea.size()[1]*2, self.maxdisp/4 *3,  refimg_fea.size()[2],  refimg_fea.size()[3]).zero_()).cuda()
for i in range(self.maxdisp/4 *3):
    if i > 0 :
        cost[:, :refimg_fea.size()[1], i, :,:]  = refimg_fea[:,:,:,:]
        cost[:, refimg_fea.size()[1]:, i, :,:] = shift_down[:,:,:,:]
        shift_down = self.forF(shift_down)
    else:
        cost[:, :refimg_fea.size()[1], i, :,:]  = refimg_fea
        cost[:, refimg_fea.size()[1]:, i, :,:]  = targetimg_fea
        shift_down = self.forF(targetimg_fea)
cost = cost.contiguous()
</syntaxhighlight>


===3D Encoder-Decoder===
===3D Encoder-Decoder===