Visual Learning and Recognition: Difference between revisions

No edit summary
Line 633: Line 633:
* FPN: Feature Pyramid Network
* FPN: Feature Pyramid Network


===ION: Inside Out Network===
===ION: Inside-Outside Network===
Bell ''et al.'' <ref name="bell2016ion"></ref>   
Bell ''et al.'' <ref name="bell2016ion"></ref>   
The key idea is that we want a feature vector which uses features from multiple scales.
The key idea is that we want a feature vector which includes multi-scale and contextual information.


'''Potential Exam Question'''
'''Potential Exam Question'''
Line 641: Line 641:
We want to use features from multiple levels. The RoI is fixed.   
We want to use features from multiple levels. The RoI is fixed.   
The resolution, number of channels, and magnitude of features can be different.   
The resolution, number of channels, and magnitude of features can be different.   
* Do L2 normalization of the features at different layers
 
* Concatenate features
Architecture
* Rescale them.
# There are 5 conv blocks, followed by two 4-dir IRNN blocks which extract context features.
* Do 1x1 convolution and give it to the FC layer.
# The whole image passes through this entire network.
 
* For each RoI identified using object proposals:
* Do L2 normalization of the features at different layers (Conv3, conv4, conv5, and context features)
* Concatenate features to a single feature image.
* Rescale them and do 1x1 convolution to get a <math>512 \times 7 \times 7</math> feature descriptors.
* Pass through two FC layers.
* Finally, one FC extracts the class via softmax and another the bounding box.


===Analysis and Diagnosis===
===Analysis and Diagnosis===