Visual Learning and Recognition: Difference between revisions

No edit summary
Line 396: Line 396:
Then we can compute the max for p2 with respect to p1, p3 wrt p1, and then p1.
Then we can compute the max for p2 with respect to p1, p3 wrt p1, and then p1.


====Mixture Models====
;Is One Model Enough?
;Is One Model Enough?
In generally no because objects have multiple views.   
In generally no because objects have multiple views.   
Line 408: Line 409:
;Analyzing Mixture Models
;Analyzing Mixture Models
<math>L(\beta) = \frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^{n} \max(0, 1-y_i * score(\mathbf{z}))</math>
<math>L(\beta) = \frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^{n} \max(0, 1-y_i * score(\mathbf{z}))</math>
===Region-based Approaches===
1 Stage:
* Overfeat
* SSD
* YOLO
2 Stage:
* RCNN
* Fast RCNN
* Mask RCNN
Instance based:
* SDS
* RFCN
* MASK RCNN
===Overfeat===
Winner of ILSVRC 2014 localization challenge. 
The architecture first passes the image through some convolution & pooling layers.
The a sequence of FC layers produces an output.
;Sliding Window: 
If network takes 3x221x221 and you have an image 3x257x257. 
Run image through network with sliding window. Then greedily merge the boxes. 
;Efficient sliding window
Use a fully convolutional network.


==Will be on the exam==
==Will be on the exam==