5,337
edits
Line 440: | Line 440: | ||
The idea is that they train a CNN to do object detection over the entire image. | The idea is that they train a CNN to do object detection over the entire image. | ||
The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales. | The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales. | ||
Pixels of the feature maps are ''default boxes'' | Pixels of the feature maps are scores for ''default boxes''; each pixel is associated with a default bounding box. | ||
The candidate results from the feature maps are filtered using non-maximum suppression. | |||
Different scales are achieved by extracting feature maps from intermediate layers of the network. | Different scales are achieved by extracting feature maps from intermediate layers of the network. | ||
The aspect ratio of each default box does not actually correspond to the receptive field associated with the feature pixel. | The aspect ratio of each default box does not actually correspond to the receptive field associated with the feature pixel. |