Visual Learning and Recognition: Difference between revisions

Line 955: Line 955:


==3D Scene Understanding==
==3D Scene Understanding==
What can you get from knowing pairwise pixel distances? (i.e. given two sets of pixels, which pair is closer in 3D space) 
You can get horizons.
Single Image Reconstruction
By finding vanishing points and lines, you can do 3D reconstruction.
;Taxonomy
How: Bottom up classifiers to explicit constraints and reasoning.
What: Qualitative to explicit/quantitative.
From qualitative to quantitative:
* Surface labels
* Boundaries + objects
* Stronger geometric constraints
* Reasoning on aspects & poses
* 3D point clouds
Using depth ordering, surface labels, and occlusion cues can give us a planar reconstruction.
Benefits of volumes:
* Finite volumes
* Spatial exclusion (no intersections)
* Mechanical relationships and physical stability (one volume atop another)
Room layout estimation:
* Estimate walls and floor from vanishing points.
* Three principle directions
* Every room is a box
* Minimum number of walls is 1, maximum is 6 but most see 5 walls if camera is facing one wall.
* Use geometric context, optimizing to get a room context.
* Given segmentation masks, you can estimate clutter vs free space.
Functional constraints:
* People sit on laptops, people can open drawer, ...
Primitives
* Depth
* Surface normals
==Objects + 3D==


==Will be on the exam==
==Will be on the exam==