5,337
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Batch norm is normalizing the mean and standard deviation of each mini-batch. | |||
The goal is to speed up the training process. | |||
* [https://arxiv.org/abs/1502.03167 Arxiv Paper] | |||
Batch norm adds two trainable parameters to your network: | |||
* An average mean. | |||
* An average std dev. | |||
During training, these two values are computed from the batch. | |||
During evaluation, it uses these two learned values to do normalization. | |||
==Batch Norm in CNNs== | ==Batch Norm in CNNs== | ||
See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN]. | See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN]. | ||
While batch norm is very common in CNNs, it can lead to unexpected side effects such as brightness changes. | |||
You should avoid using batch norm if you need to make a video frame-by-frame. | |||
In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features. | In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features. | ||
<pre> | <pre> | ||
# t is still the incoming tensor of shape | # t is still the incoming tensor of shape (B, H, W, C) | ||
# but mean and stddev are computed along (0, 1, 2) axes and have just | # but mean and stddev are computed along (0, 1, 2) axes and have just shape (C) | ||
t_mean = mean(t, axis=(0, 1, 2)) | |||
t_stddev = stddev(t, axis=(0, 1, 2)) | |||
out = (t - t_mean.view(1,1,1,C)) / t_stddev.view(1,1,1,C) | |||
</pre> | </pre> | ||