Visual Learning and Recognition: Difference between revisions

no edit summary
No edit summary
Line 298: Line 298:
How many octaves? However many octaves to reduces the image size to the template size + 1 for 2x2 upscaling.   
How many octaves? However many octaves to reduces the image size to the template size + 1 for 2x2 upscaling.   
How many levels? Generally people try 10 levels.
How many levels? Generally people try 10 levels.
===Non-max Supression===
The NMS heuristic here is used to reduce the number of bounding boxes per object to 1.
Initially, you have a set of overlapping bounding boxes <math>B</math>. 
Create a final set <math>D</math>. 
* While B is not empty
** Remove the highest confidence/score box <math>b_i</math> from <math>B</math>. Add it to <math>D</math>
** For every other box <math>b_j</math>,
*** If <math>IOU(b_i, b_j) > \lambda</math> (i.e. they bound the same object), discard <math>b_j</math>


===Precision and Recall===
===Precision and Recall===
Line 364: Line 354:
The area under the Precision vs Recall curve is call the average precision (AP).
The area under the Precision vs Recall curve is call the average precision (AP).


;Hard mining
===Non-max Supression===
The NMS heuristic here is used to reduce the number of bounding boxes per object to 1.
 
Initially, you have a set of overlapping bounding boxes <math>B</math>. 
Create a final set <math>D</math>. 
* While B is not empty
** Remove the highest confidence/score box <math>b_i</math> from <math>B</math>. Add it to <math>D</math>
** For every other box <math>b_j</math>,
*** If <math>IOU(b_i, b_j) > \lambda</math> (i.e. they bound the same object), discard <math>b_j</math>
 
===Hard mining===
During training, classify on all images.   
During training, classify on all images.   
Figure out which instances the classifier classifies incorrectly.
Figure out which instances the classifier classifies incorrectly.
Then train only on those negative instances.
Then train only on those negative instances.


;Current HOG
===Current HOG===
Current HOG uses 31 dimensions
Current HOG uses 31 dimensions
* 9 Contrast insensitive gradients
* 9 Contrast insensitive gradients
* 18 Contrast sensitive gradients
* 18 Contrast sensitive gradients
* 4
* 4 Texture Related


===Discriminatively Trained Part Based Models (DPM)===
===Discriminatively Trained Part Based Models (DPM)===