Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

446 bytes added , 28 October 2020

5,337

edits

@@ Line 633: / Line 633: @@
 * FPN: Feature Pyramid Network
-===ION: Inside Out Network===
+===ION: Inside-Outside Network===
 Bell ''et al.'' <ref name="bell2016ion"></ref>
-The key idea is that we want a feature vector which uses features from multiple scales.
+The key idea is that we want a feature vector which includes multi-scale and contextual information.
 '''Potential Exam Question'''
@@ Line 641: / Line 641: @@
 We want to use features from multiple levels. The RoI is fixed.
 The resolution, number of channels, and magnitude of features can be different.
-* Do L2 normalization of the features at different layers
-* Concatenate features
+Architecture
-* Rescale them.
+# There are 5 conv blocks, followed by two 4-dir IRNN blocks which extract context features.
-* Do 1x1 convolution and give it to the FC layer.
+# The whole image passes through this entire network.
+* For each RoI identified using object proposals:
+* Do L2 normalization of the features at different layers (Conv3, conv4, conv5, and context features)
+* Concatenate features to a single feature image.
+* Rescale them and do 1x1 convolution to get a <math>512 \times 7 \times 7</math> feature descriptors.
+* Pass through two FC layers.
+* Finally, one FC extracts the class via softmax and another the bounding box.
 ===Analysis and Diagnosis===