Jump to content

Learning Independent Object Motion from Unlabelled Stereoscopic Videos

From David's Wiki

Learning Independent Object Motion from Unlabelled Stereoscopic Videos (CVPR 2019)

Authors: Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik
Affiliations: UC Berkeley, Fyusion Inc, Google

Website
CVF ArXiv

Method

Key Contributions

Learning with limited supervision
Factoring the scene into independent moving objects (main idea of the paper)
Designing a network architecture using place sweep volumes

Inputs

Image pairs \(\{(I_1^l, I_1^r),..., (I_n^l, I_n^r)\}\) from unlabelled stereo videos
Object bounding boxes \(B = \{B^1,..., B^j\}\) on the left image \(I_t^l\) from off-the-shelf object detectors

Goal/Outputs

Dense depth map \(D\)
3D flow fields \(F = \{F^1,..., F^j\}\)
Instance masks \(M=\{M^1,..., M^j\}\)

For each region of interest RoI, predict a per-object flow map using a RCNN
- Also predict a object mask for each RoI
Construct a full 3D scene flow map using the per-object flow maps.

Self Supervision and Loss Functions

View Synthesis
Geometric consistency: The depth values of the warped image and the reference image should match

Left Right consistency \(L^{lr}\)
RoI Loss \(L^{roi}\)
Full image based loss \(L^{t}\)

Architecture

Evaluation

References

Retrieved from "https://wiki.davidl.me/index.php?title=Learning_Independent_Object_Motion_from_Unlabelled_Stereoscopic_Videos&oldid=3497"