Deep Learning: Difference between revisions

Line 430: Line 430:


===Lecture 9 (Sept 17)===
===Lecture 9 (Sept 17)===
From previous lecture (Zhang ''et al.''<ref name="zhang2017understanding"></ref>), we see that NN optimization is not much more difficult training on random labels in terms of convergence rate. Thus, Rademacher complexity and VC dimension cannot explain generalization by itself.
Examples of explicit regularization:
* Data augmentation (e.g. random crop)
* Weight decay (L2 regularization on parameters)
* Dropout
These types of explicit regularization improves generalization, but models still generalize well without them. 
One reason would be ''implicit regularization'' by SGD. 
Belkin ''et al.''


==Misc==
==Misc==