Batch normalization: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
Batch norm is normalizing the mean and standard deviation of each mini-batch. 
The goal is to speed up the training process.
* [https://arxiv.org/abs/1502.03167 Arxiv Paper]
Batch norm adds two trainable parameters to your network:
* An average mean.
* An average std dev.
During training, these two values are computed from the batch.
During evaluation, it uses these two learned values to do normalization.


==Batch Norm in CNNs==
==Batch Norm in CNNs==
See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN].
See [https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network Batch norm in CNN].
While batch norm is very common in CNNs, it can lead to unexpected side effects such as brightness changes. 
You should avoid using batch norm if you need to make a video frame-by-frame.


In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features.
In a CNN, the mean and standard deviation are calculated across the batch, width, and height of the features.
<pre>
<pre>
# t is still the incoming tensor of shape [B, H, W, C]
# t is still the incoming tensor of shape (B, H, W, C)
# but mean and stddev are computed along (0, 1, 2) axes and have just [C] shape
# but mean and stddev are computed along (0, 1, 2) axes and have just shape (C)
mean = mean(t, axis=(0, 1, 2))
t_mean = mean(t, axis=(0, 1, 2))
stddev = stddev(t, axis=(0, 1, 2))
t_stddev = stddev(t, axis=(0, 1, 2))
for i in 0..B-1, x in 0..H-1, y in 0..W-1:
out = (t - t_mean.view(1,1,1,C)) / t_stddev.view(1,1,1,C)
  out[i,x,y,:] = norm(t[i,x,y,:], mean, stddev)
</pre>
</pre>