Learning Independent Object Motion from Unlabelled Stereoscopic Videos: Difference between revisions

No edit summary
 
(One intermediate revision by the same user not shown)
Line 8: Line 8:


==Method==
==Method==
;Key Contributions
* Learning with limited supervision
* Factoring the scene into independent moving objects (main idea of the paper)
* Designing a network architecture using place sweep volumes
;Inputs:
* Image pairs \(\{(I_1^l, I_1^r),..., (I_n^l, I_n^r)\}\) from unlabelled stereo videos
* Object bounding boxes \(B = \{B^1,..., B^j\}\) on the left image \(I_t^l\) from off-the-shelf object detectors
;Goal/Outputs:
* Dense depth map \(D\)
* 3D flow fields \(F = \{F^1,..., F^j\}\)
* Instance masks \(M=\{M^1,..., M^j\}\)
* For each region of interest RoI, predict a per-object flow map using a RCNN
** Also predict a object mask for each RoI
* Construct a full 3D scene flow map using the per-object flow maps.
===Self Supervision and Loss Functions===
* View Synthesis
* Geometric consistency: The depth values of the warped image and the reference image should match
* Left Right consistency \(L^{lr}\)
* RoI Loss \(L^{roi}\)
* Full image based loss \(L^{t}\)


==Architecture==
==Architecture==