Convolutional neural network: Difference between revisions
(16 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
though they can be used anywhere you have a rectangular grid with spatial relationship among your data. | though they can be used anywhere you have a rectangular grid with spatial relationship among your data. | ||
Typically convolutional layers are | Typically convolutional layers are used in blocks consisting of the following: | ||
* Conv2D layer. | * Conv2D layer. | ||
** Usually stride 2 for encoders, stride 1 for decoders. | ** Usually stride 2 for encoders, stride 1 for decoders. | ||
Line 21: | Line 21: | ||
Upsampling blocks also have a transposed convolution or a bilinear upsample in the beginning. | Upsampling blocks also have a transposed convolution or a bilinear upsample in the beginning. | ||
The last layer is typically just a Conv2D with a possible | The last layer is typically just a <math>1 \times 1</math> or <math>3 \times 3</math> Conv2D with a possible sigmoid to control the range of the outputs. | ||
==Motivation== | ==Motivation== | ||
Line 34: | Line 34: | ||
[https://pytorch.org/docs/stable/nn.html#convolution-layers Pytorch Convolution Layers]<br> | [https://pytorch.org/docs/stable/nn.html#convolution-layers Pytorch Convolution Layers]<br> | ||
[https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d Types of convolutions animations]<br> | [https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d Types of convolutions animations]<br> | ||
Here, we will explain 2d convolutions.<br> | Here, we will explain 2d convolutions, also known as cross-correlation.<br> | ||
Suppose we have the following input image:<br> | Suppose we have the following input image:<br> | ||
<pre> | <pre> | ||
Line 72: | Line 72: | ||
\end{bmatrix} | \end{bmatrix} | ||
</math><br> | </math><br> | ||
Summing up all the elements gives us <math>66</math> which would go in the first index of the output. | Summing up all the elements gives us <math>66</math> which would go in the first index of the output. | ||
Shifting the kernel over all positions of the image gives us the whole output, another 2D image. | |||
The formula for the output resolution of a convolution is: | The formula for the output resolution of a convolution is: | ||
Line 86: | Line 87: | ||
===Kernel=== | ===Kernel=== | ||
Typically these days people use small kernels e.g. \(3 \times 3\) or \(4 \times 4\). | Typically these days people use small kernels e.g. \(3 \times 3\) or \(4 \times 4\) with many conv layers. | ||
However, historically people used larger kernels (e.g. <math>7 \times 7</math>). This leads to more parameters which need to be trained and thus networks cannot be as deep. | |||
Note that in practice, people use multi-channel inputs so the actual kernel will be 3D. | Note that in practice, people use multi-channel inputs so the actual kernel will be 3D. | ||
Line 94: | Line 96: | ||
===Stride=== | ===Stride=== | ||
How much the kernel moves | How much the kernel moves. Typically 1 or 2. | ||
Moving by 2 will yield half the resolution of the input. | |||
===Padding=== | ===Padding=== | ||
Line 102: | Line 105: | ||
;Common Types of padding | ;Common Types of padding | ||
* Zero | * Zero or Constant padding | ||
* Mirror/Reflection | * Mirror/Reflection padding | ||
* Replication | * Replication padding | ||
With convolution layers in | With convolution layers in libraries you often see these two types of padding which can be added to the conv layer directly: | ||
* <code>VALID</code> - Do not do any padding | * <code>VALID</code> - Do not do any padding | ||
* <code>SAME</code> - Apply zero padding such that the output will have resolution \(\lfloor x/stride \rfloor\). | * <code>SAME</code> - Apply zero padding such that the output will have resolution \(\lfloor x/stride \rfloor\). | ||
Line 139: | Line 142: | ||
Pooling is one method of reducing and increasing the resolution of your feature maps. | Pooling is one method of reducing and increasing the resolution of your feature maps. | ||
You can also use bilinear upsampling or downsampling. | You can also use bilinear upsampling or downsampling. | ||
Typically the stride of pooling is equal to the filter size so a <math>2 \times 2</math> pooling will have a stride of <math>2</math> and result in an image with half the width and height. | |||
===Avg Pooling=== | ===Avg Pooling=== | ||
Take the average over a region | Take the average over a region. | ||
This is equivalent to bilinear downsampling. | |||
===Max Pooling=== | ===Max Pooling=== | ||
Line 148: | Line 153: | ||
===Unpooling=== | ===Unpooling=== | ||
During max pooling, remember the indices where you pulled from in | During max pooling, remember the indices where you pulled from in ''switch variables''.<br> | ||
Then when unpooling, save the max value into those indices. Other indices get values of 0. | Then when unpooling, save the max value into those indices. Other indices get values of 0. | ||
==Spherical Images== | ==Spherical Images== | ||
There are many ways to adapt convolutional layers to spherical images. | There are many ways to adapt convolutional layers to spherical images. | ||
* [http://papers.nips.cc/paper/6656-learning-spherical-convolution-for-fast-features-from-360-imagery Learning Spherical Convolution for Fast Features from 360 Imagery (NIPS 2017)] proposes using different kernels with different weights and sizes for different altitudes \(\phi\). | * [http://papers.nips.cc/paper/6656-learning-spherical-convolution-for-fast-features-from-360-imagery Learning Spherical Convolution for Fast Features from 360 Imagery (NIPS 2017)] proposes using different kernels with different weights and sizes for different altitudes \(\phi\). | ||
* [https://www.tu-chemnitz.de/etit/proaut/publications/schubert19_IV.pdf Circular Convolutional Neural Networks (IV 2019)] proposes padding the left and right sides of each input and feature map using pixels such that the input wraps around. This works since equirectangular images wrap around on the x-axis. | * [https://www.tu-chemnitz.de/etit/proaut/publications/schubert19_IV.pdf Circular Convolutional Neural Networks (IV 2019)] proposes padding the left and right sides of each input and feature map using pixels such that the input wraps around. This works since equirectangular images wrap around on the x-axis. | ||
* [https://arxiv.org/abs/1811.08196 SpherePHD (CVPR 2019)] proposes using faces of an icosahedron as pixels. They propose a kernel which considers the neighboring 9 triangles of each triangle. | * [https://arxiv.org/abs/1811.08196 SpherePHD (CVPR 2019)] proposes using faces of an icosahedron as pixels. They propose a kernel which considers the neighboring 9 triangles of each triangle. They also develop methods for pooling. | ||
* [https://arxiv.org/abs/1807.03247 CoordConv] adds additional channels to each 2D convolution layer which feeds positional information (UV coordinates) to the convolutional kernel. This allows the kernel to account for distortions. Note that the positional information is merely UV coordinates and is not learned like in NLP. | * [https://arxiv.org/abs/1807.03247 CoordConv (NeurIPS 2018)] adds additional channels to each 2D convolution layer which feeds positional information (UV coordinates) to the convolutional kernel. This allows the kernel to account for distortions. Note that the positional information is merely UV coordinates and is not learned like in NLP. | ||
* [https://arxiv.org/pdf/1901.02039.pdf Jiang et al.] perform convolutions on meshes using linear combination of first order derivatives and the Laplacian second order derivative. These derivatives are estimated based on neighboring vertices and faces. Experiments are performed on a sphere mesh. | * [https://arxiv.org/pdf/1901.02039.pdf Jiang et al. (ICLR 2019)] perform convolutions on meshes using linear combination of first order derivatives and the Laplacian second order derivative. These derivatives are estimated based on the values and positions of neighboring vertices and faces. Experiments are performed on a sphere mesh. |