Visual Learning and Recognition: Difference between revisions

Line 437: Line 437:


===Single Stage Multibox Detector (SSD)===
===Single Stage Multibox Detector (SSD)===
Liu ''et al'' (2016) propose SSD: Single Shot MultiBox Detector. 
The idea is that they train a CNN to do object detection over the entire image. 
The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales. 
Pixels of the feature maps are ''default boxes'', representing a default bounding box. 
Each feature map gives candidate results which are filtered using non-maximum suppression. 
Different scales are achieved by extracting feature maps from intermediate layers of the network. 
The aspect ratio of each default box does not actually correspond to the receptive field associated with the feature pixel. 
During training, all ''default boxes'' with jaccard overlap >0.5 with the ground truth are matched. 
They also apply hard negative mining and data augmentation.
===YOLO===
===YOLO===