5,337
edits
Line 512: | Line 512: | ||
Read: pdollar A seismic shift in object detection | Read: pdollar A seismic shift in object detection | ||
==Object Detection (Part 2)== | |||
===R-CNN=== | |||
;R-CNN at test time | |||
# From an input image, they extract ~2k region proposals. | |||
#* All of the region proposals likely contain an object. | |||
# For each bounding box: | |||
#* Dilate the proposal. | |||
#* Crop it out and scale to <math>227 \times 227</math>. | |||
#* Convert to <math>4096</math>-dim feature and do classification using an SVM. | |||
# Do object proposal refinement to predict object bounding box. | |||
;Training R-CNN | |||
# First train a CNN for 1000-way ImageNet image classification. | |||
# Fine-tune the CNN for detection from PASCAL VOC. | |||
# Train detection SVMs. | |||
Both training and inference are super-slow. | |||
Extracting RoI takes a lot of time. | |||
Then you need to do a forward pass for each of the 2k regions to get features. | |||
Inference on $1$ images takes almost $1$ minute. | |||
===SPP-net=== | |||
Makes R-CNN fast using a spatial pyramid pooling (SPP) layer. | |||
# Run a frozen CNN over the whole image to get a feature map. | |||
# Map boxes from region proposals generated by selective search to the feature map. | |||
# For each region, resize to <math>7 \times 7 \times 256</math>, do SPP and pass to an FC network to get bbox and class. | |||
Hard-mining: | |||
For each of the 2000 boxes, you have IOU_foreground and IOU_background. | |||
===Fast R-CNN=== | |||
Makes the whole network trainable. | |||
'''Exam Question''' | |||
==Will be on the exam== | ==Will be on the exam== |