5,337
edits
No edit summary |
|||
Line 396: | Line 396: | ||
Then we can compute the max for p2 with respect to p1, p3 wrt p1, and then p1. | Then we can compute the max for p2 with respect to p1, p3 wrt p1, and then p1. | ||
====Mixture Models==== | |||
;Is One Model Enough? | ;Is One Model Enough? | ||
In generally no because objects have multiple views. | In generally no because objects have multiple views. | ||
Line 408: | Line 409: | ||
;Analyzing Mixture Models | ;Analyzing Mixture Models | ||
<math>L(\beta) = \frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^{n} \max(0, 1-y_i * score(\mathbf{z}))</math> | <math>L(\beta) = \frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^{n} \max(0, 1-y_i * score(\mathbf{z}))</math> | ||
===Region-based Approaches=== | |||
1 Stage: | |||
* Overfeat | |||
* SSD | |||
* YOLO | |||
2 Stage: | |||
* RCNN | |||
* Fast RCNN | |||
* Mask RCNN | |||
Instance based: | |||
* SDS | |||
* RFCN | |||
* MASK RCNN | |||
===Overfeat=== | |||
Winner of ILSVRC 2014 localization challenge. | |||
The architecture first passes the image through some convolution & pooling layers. | |||
The a sequence of FC layers produces an output. | |||
;Sliding Window: | |||
If network takes 3x221x221 and you have an image 3x257x257. | |||
Run image through network with sliding window. Then greedily merge the boxes. | |||
;Efficient sliding window | |||
Use a fully convolutional network. | |||
==Will be on the exam== | ==Will be on the exam== |