Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

835 bytes added , 8 October 2020

5,337

edits

@@ Line 437: / Line 437: @@
 ===Single Stage Multibox Detector (SSD)===
+Liu ''et al'' (2016) propose SSD: Single Shot MultiBox Detector.
+The idea is that they train a CNN to do object detection over the entire image.
+The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales.
+Pixels of the feature maps are ''default boxes'', representing a default bounding box.
+Each feature map gives candidate results which are filtered using non-maximum suppression.
+Different scales are achieved by extracting feature maps from intermediate layers of the network.
+The aspect ratio of each default box does not actually correspond to the receptive field associated with the feature pixel.
+During training, all ''default boxes'' with jaccard overlap >0.5 with the ground truth are matched.
+They also apply hard negative mining and data augmentation.
 ===YOLO===