Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

587 bytes added , 8 October 2020

5,337

edits

@@ Line 396: / Line 396: @@
 Then we can compute the max for p2 with respect to p1, p3 wrt p1, and then p1.
+====Mixture Models====
 ;Is One Model Enough?
 In generally no because objects have multiple views.
@@ Line 408: / Line 409: @@
 ;Analyzing Mixture Models
 <math>L(\beta) = \frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^{n} \max(0, 1-y_i * score(\mathbf{z}))</math>
+===Region-based Approaches===
+Stage:
+* Overfeat
+* SSD
+* YOLO
+Stage:
+* RCNN
+* Fast RCNN
+* Mask RCNN
+Instance based:
+* SDS
+* RFCN
+* MASK RCNN
+===Overfeat===
+Winner of ILSVRC 2014 localization challenge.
+The architecture first passes the image through some convolution & pooling layers.
+The a sequence of FC layers produces an output.
+;Sliding Window:
+If network takes 3x221x221 and you have an image 3x257x257.
+Run image through network with sliding window. Then greedily merge the boxes.
+;Efficient sliding window
+Use a fully convolutional network.
 ==Will be on the exam==