Batch normalization: Difference between revisions

← Older edit

@@ Line 1: / Line 1: @@
+Batch norm is normalizing the mean and standard deviation of each mini-batch.
+The goal is to speed up the training process.
+* [https://arxiv.org/abs/1502.03167 Arxiv Paper]
+Batch norm adds two trainable parameters to your network:
+* An average mean.
+* An average std dev.
+For CNNs each of these is a vector the size of the number of channels.
+During training, these two values are computed from the batch.
+During evaluation, it uses these two learned values to do normalization.
+==Batch Norm in CNNs==
+See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN].
+While batch norm is very common in CNNs, it can lead to unexpected side effects such as brightness changes.
+You should avoid using batch norm if you need to make a video frame-by-frame.
+In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features.
+<pre>
+# t is still the incoming tensor of shape (B, H, W, C)
+# but mean and stddev are computed along (0, 1, 2) axes and have just shape (C)
+t_mean = mean(t, axis=(0, 1, 2))
+t_stddev = stddev(t, axis=(0, 1, 2))
+out = (t - t_mean.view(1,1,1,C)) / t_stddev.view(1,1,1,C)
+</pre>
 ==Resources==
 * [[Wikipedia: Batch normalization]]
 * [https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c]