5,321
edits
Line 430: | Line 430: | ||
===Lecture 9 (Sept 17)=== | ===Lecture 9 (Sept 17)=== | ||
From previous lecture (Zhang ''et al.''<ref name="zhang2017understanding"></ref>), we see that NN optimization is not much more difficult training on random labels in terms of convergence rate. Thus, Rademacher complexity and VC dimension cannot explain generalization by itself. | |||
Examples of explicit regularization: | |||
* Data augmentation (e.g. random crop) | |||
* Weight decay (L2 regularization on parameters) | |||
* Dropout | |||
These types of explicit regularization improves generalization, but models still generalize well without them. | |||
One reason would be ''implicit regularization'' by SGD. | |||
Belkin ''et al.'' | |||
==Misc== | ==Misc== |