Visual Learning and Recognition: Difference between revisions

no edit summary
No edit summary
Line 437: Line 437:


===Single Stage Multibox Detector (SSD)===
===Single Stage Multibox Detector (SSD)===
Liu ''et al'' (2016) propose SSD: Single Shot MultiBox Detector.   
Liu ''et al'' (2016)<ref name="liu2016ssd"></ref> propose SSD: Single Shot MultiBox Detector.   
The idea is that they train a CNN to do object detection over the entire image.   
The idea is that they train a CNN to do object detection over the entire image.   
The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales.   
The CNN outputs multiple feature maps for each of the categories, each with different aspect ratios and scales.   
Line 449: Line 449:


===YOLO===
===YOLO===
Redmon ''et al.''<ref name="redmon2016yolo"></ref> develop You Only Look Once: Unified, Real-Time Object Detection. 
This is similar to the SSD paper. Each image is processed into an <math>S \times S</math> grid. The difference is that rather than each grid cell corresponding to a ''default box'', the grid cell needs to produce the bounding box for the image centered at that pixel. Each cell predicts (x, y, w, h, confidence) where (x,y) represent the center of the bounding box relative to the grid cell as well a class probabilities. The output of the network is a <math>S \times S \times (B*5+C)</math> tensor.
Some training tricks: They use Lrelu. They predict square root of height and width. They weigh bounding boxes containing objects 10x those which are empty in the loss function.


==Semantic Segmentation==
==Semantic Segmentation==
Line 502: Line 506:
Neural Networks (NIPS 2012) [https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]</ref>
Neural Networks (NIPS 2012) [https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]</ref>
<ref name="felzenszwalb2009dpm">Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan (2009) Object Detection with Discriminatively Trained Part Based Models  [http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf]</ref>
<ref name="felzenszwalb2009dpm">Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan (2009) Object Detection with Discriminatively Trained Part Based Models  [http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf]</ref>
<ref name="liu2016ssd">Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg (2016) SSD: Single Shot MultiBox Detector [https://arxiv.org/abs/1512.02325 https://arxiv.org/abs/1512.02325]</ref>
<ref name="redmon2016yolo">Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016) You Only Look Once: Unified, Real-Time Object Detection [https://pjreddie.com/media/files/papers/yolo.pdf https://pjreddie.com/media/files/papers/yolo.pdf]</ref>
}}
}}