SfM-Net: Learning of Structure and Motion from Video: Difference between revisions

no edit summary
No edit summary
Line 10: Line 10:


==Method==
==Method==
The first build two networks to estimate the following for time \(t\):
* Image depth \(d_t \in [0,\infty)^{w \times h} \).
* Camera rotation and translation: \(\{R_t^c, t_t^c \}\)
* K motion masks: \(m_t^k\) for \(k=1,...,K\)
* K object motions: \(\{R_t^k, t_t^k \}\)
* Use the depth to generate a point cloud
* Transform the point cloud based on object transformations
* Transform the point cloud based on camera transformations
* Compute optical flow and do warping
* Repeat with \(I_{t+1}\) and \(I_{t}\) for backward consistency
===Supervision/Loss Function===
They apply several forms of supervision:
* Self-supervision: Minimize distance between the reference and the warped image
* Spatial smoothness priors: Penalize l1 norm of gradients on optical flow field, depth, and motion maps
* Forward-backward consistency constraints: Do a run backwards in time and make sure the depth \(d_{t+1}\) is consistent with \(d_t\)
* Supervising depth: Minimize estimate and ground truth depth
* Supervising camera motion: Minimize estimate and ground truth camera motion
* Supervising optical flow and object motion: Minimize estimate and ground truth optical flow and object motion on synthetic datasets


==Architecture==
==Architecture==
SfM-Net consists of two neural networks: 
* The motion network estimates camera motion, object motion, and object masks.
* The structure network estimates depth what can be used to make a point cloud.
Both networks follow a Conv-Deconv (U-Net) structure with skip connections. 
See the figure in the paper for more details
===Motion Network===
The inputs to the motion network are a pair of video frames \(I_t\) and \(I_{t+1}\) totaling a tensor with shape (\(380 \times 128 \times 6\). 
From this, the motion network predicts the following:
* Camera rotation and translation: \(\{R_t^c, t_t^c \}\)
* K motion masks: \(m_t^k\) for \(k=1,...,K\)
* K object motions: \(\{R_t^k, t_t^k \}\)
===Structure Network===
The goal of the structure network is to estimate depth \(d_t \in [0,\infty)^{w \times h} \). 


==References==
==References==