Convolutional neural network: Difference between revisions
Line 69: | Line 69: | ||
===Padding=== | ===Padding=== | ||
Convolutional layers yield an output smaller than the input size. | Convolutional layers typically yield an output smaller than the input size. | ||
We can use padding to increase the input size. | We can use padding to increase the input size. | ||
See [https://www.machinecurve.com/index.php/2020/02/10/using-constant-padding-reflection-padding-and-replication-padding-with-keras/ Machinecurve: Using Constant Padding, Reflection Padding and Replication Padding with Keras] | See [https://www.machinecurve.com/index.php/2020/02/10/using-constant-padding-reflection-padding-and-replication-padding-with-keras/ Machinecurve: Using Constant Padding, Reflection Padding and Replication Padding with Keras] | ||
Revision as of 04:18, 14 June 2020
Convolutional Neural Network
Primarily used for image tasks such as computer vision or image generation,
though they can be used anywhere you have a rectangular grid with spatial relationship among your data.
Motivation
Convolutional neural networks leverage the following properties of images:
- Stationarity or shift-invariance - objects in an image should be recognized regardless of their position
- Locality or local-connectivity - nearby pixels are more relevant than distant pixels
- Compositionality - objects in images have a multi-resolution structure.
Convolutions
Pytorch Convolution Layers
Types of convolutions animations
Here, we will explain 2d convolutions.
Suppose we have the following input image:
1 2 3 4 5 6 7 8 2 3 2 5 9 9 5 4 8 8 2 1
and the following 3x3 kernel:
1 1 1 2 1 2 3 2 1
For each possible position of the 3x3 kernel over the input image,
we perform an element-wise multiplication (\(\displaystyle \odot\)) and sum over all entries to get a single value.
Placing the kernel in the first position would yield:
\(\displaystyle
\begin{bmatrix}
1 & 2 & 3\\
6 & 7 & 8\\
2 & 5 & 9
\end{bmatrix}
\odot
\begin{bmatrix}
1 & 1 & 1\\
2 & 1 & 2\\
3 & 2 & 1
\end{bmatrix}
=
\begin{bmatrix}
1 & 2 & 3\\
12 & 7 & 16\\
6 & 10 & 9
\end{bmatrix}
\)
Summing up all the elements gives us \(\displaystyle 66\) which would go in the first index of the output.
The formula for the output resolution of a convolution is: \(\displaystyle \frac{x-k+2p}{s}+1 \) where:
- \(x\) is the input resolution
- \(k\) is the kernel size (e.g. 3)
- \(p\) is the padding on each side
- \(s\) is the stride
Typically a \(3 \times 3\) conv layer will have a padding of 1 and stride of 1 to maintain the same size. A stride of \(2\) would halve the resolution.
Stride
How much the kernel moves along. Typically 1 or 2.
Padding
Convolutional layers typically yield an output smaller than the input size.
We can use padding to increase the input size.
See Machinecurve: Using Constant Padding, Reflection Padding and Replication Padding with Keras
- Types of padding
- Zero/Constant
- Mirror/Reflection
- Replication
With convolution layers in tensorflow and other libraries you often see these two types of padding:
VALID
- Do not do any paddingSAME
- Apply zero padding such that the output will have resolution \(\lfloor x/stride \rfloor\).- \(p=\frac{k-1}{2}\)
- Exact behavior varies between ML frameworks. Only apply this unless you know you will have symmetric padding on both sides.
Dilation
Space between pixels in the kernel
A dilation of 1 will apply a 3x3 kernel over a 5x5 region. This would be equivalent to a 5x5 kernel with odd index weights (\(\displaystyle i \% 2 == 1\)) set to 0.
Groups
Other Types of Convolutions
Transpose Convolution
See Medium Post
Instead of your 3x3 kernel taking 9 values as input and returning 1 value (</math>\sum_i \sum_j w_{ij} * i_{i+x,j+y}</math>), the kernel now takes 1 value and returns 9 (</math>w_{ij} * i_{x,y}</math>).
Gated Convolution
See Gated Convolution (ICCV 2019)
Given an image, we have two convolution layers \(\displaystyle k_{feature}\) and \(\displaystyle k_{gate}\).
The output is \(\displaystyle O = \phi(k_{feature}(I)) \odot \sigma(k_{gate}(I))\)
Pooling
Unpooling
![](/img_auth.php/thumb/7/76/Unpooling_deeppainter_2016.png/500px-Unpooling_deeppainter_2016.png)
During max pooling, remember the indices where you pulled from in "switch variables".
Then when unpooling, save the max value into those indices. Other indices get values of 0.