Visual Learning and Recognition: Difference between revisions

Line 549: Line 549:


'''Exam Question'''
'''Exam Question'''
Benefits:
* Removes SVM part to do end-to-end training.
* Each region is cropped from a feature map of the image rather than the raw image.
** One con is the feature map is lower-res so small objects may become <math>1\times1</math> features.
** '''RCNN is better for smaller objects'''
===Faster R-CNN===
Focuses on the region proposals by replacing selective search with a region proposal network. 
Computes region proposals on-the-fly. 
Contains the following
# Feature extractor
# RoI Proposal Network
# RoI Classification & Regression Network
;How region proposal works
# Given an image, pass through conv filters to get feature maps.
# Map each pixel to <math>k</math> anchor boxes.
# Then a layer outputs foreground-background classification and another outputs bounding box regression for each pixel.
Generally two-stage models perform better. Everything is trained end-to-end.


==Will be on the exam==
==Will be on the exam==