Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

Revision as of 17:04, 29 September 2020

1,718 bytes added , 29 September 2020

→‎ConvNets and Architectures

David

Bureaucrats, Interface administrators, Administrators

5,323

edits

@@ Line 146: / Line 146: @@
 A single 7x7 conv layer with C-dim input and C-dim output would need <math>49 \times C^2</math> weights.
 Three <math>3\times 3</math> conv layers only need <math>27 \times C^2</math> weights.
+===Network in network===
+Use a small perceptron as your convolution kernel. I.e. the block goes into the perceptron. This output instead of calculating cross correlation with a standard kernel.
+===GoogLeNet===
+Hebbian Principle: Neurons that fire together are typically wired together.
+Implemented using an ''Inception Module''.
+The key idea is to use a heterogeneous set of convolutions.
+Naive idea: Do a 1x1 convolution, 3x3 convolution, and 5x5 convolution and then concatenate the output together.
+The intuition is that each captures a different receptive field.
+In practice, they need to add 1x1 convolutions before the 3x3 and 5x5 convolutions to make it work. These are used for dimension reduction by controlling the channel.
+Another idea is to add auxiliary classifiers across the network.
+Inception v2, v3
+V2 adds batch-normalization to reduce dependence on auxiliary classifiers.
+V3 addes factored convolutions (i.e. nx1 and 1xn convolutions).
+===ResNet===
+The main idea is to introduce skip or shortcut connections.
+This existing in literature before.
+The means returning <math>F(x)+x</math>.
+This allow smoother gradient flows since intermediate layers cannot block gradient flow.
+They also replace 3x3 convolutions on 256 channels with 1x1 to 64 channels, 3x3 on the 64 channels, then 1x1 back to 256 channels.
+This reduces parameters from approx 600k to approx 70k.
+===Accuracy vs efficiency===
+First we had AlexNet. Then we had VGG which had way more parameters and better accuracy.
+Then we had GoogLeNet which is much smaller than both AlexNet and VGG with roughly the same accuracy.
+Next ResNet and Inception increases the parameters slightly and attained better performance.
 ==Will be on the exam==