Batch normalization: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Batch norm is normalizing the mean and standard deviation of each mini-batch. | |||
The goal is to speed up the training process. | |||
* [https://arxiv.org/abs/1502.03167 Arxiv Paper] | |||
Batch norm adds two trainable parameters to your network: | |||
* An average mean. | |||
* An average std dev. | |||
For CNNs each of these is a vector the size of the number of channels. | |||
During training, these two values are computed from the batch. | |||
During evaluation, it uses these two learned values to do normalization. | |||
==Batch Norm in CNNs== | |||
See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN]. | |||
While batch norm is very common in CNNs, it can lead to unexpected side effects such as brightness changes. | |||
You should avoid using batch norm if you need to make a video frame-by-frame. | |||
In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features. | |||
<pre> | |||
# t is still the incoming tensor of shape (B, H, W, C) | |||
# but mean and stddev are computed along (0, 1, 2) axes and have just shape (C) | |||
t_mean = mean(t, axis=(0, 1, 2)) | |||
t_stddev = stddev(t, axis=(0, 1, 2)) | |||
out = (t - t_mean.view(1,1,1,C)) / t_stddev.view(1,1,1,C) | |||
</pre> | |||
==Resources== | ==Resources== | ||
* [[Wikipedia: Batch normalization]] | * [[Wikipedia: Batch normalization]] | ||
* [https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c] | * [https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c] |
Latest revision as of 00:02, 13 August 2020
Batch norm is normalizing the mean and standard deviation of each mini-batch.
The goal is to speed up the training process.
Batch norm adds two trainable parameters to your network:
- An average mean.
- An average std dev.
For CNNs each of these is a vector the size of the number of channels.
During training, these two values are computed from the batch. During evaluation, it uses these two learned values to do normalization.
Batch Norm in CNNs
See Batch norm in CNN.
While batch norm is very common in CNNs, it can lead to unexpected side effects such as brightness changes.
You should avoid using batch norm if you need to make a video frame-by-frame.
In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features.
# t is still the incoming tensor of shape (B, H, W, C) # but mean and stddev are computed along (0, 1, 2) axes and have just shape (C) t_mean = mean(t, axis=(0, 1, 2)) t_stddev = stddev(t, axis=(0, 1, 2)) out = (t - t_mean.view(1,1,1,C)) / t_stddev.view(1,1,1,C)