Deep Learning: Difference between revisions

Deep Learning (view source)

607 bytes added , 17 September 2020

5,321

edits

@@ Line 430: / Line 430: @@
 ===Lecture 9 (Sept 17)===
+From previous lecture (Zhang ''et al.''<ref name="zhang2017understanding"></ref>), we see that NN optimization is not much more difficult training on random labels in terms of convergence rate. Thus, Rademacher complexity and VC dimension cannot explain generalization by itself.
+Examples of explicit regularization:
+* Data augmentation (e.g. random crop)
+* Weight decay (L2 regularization on parameters)
+* Dropout
+These types of explicit regularization improves generalization, but models still generalize well without them.
+One reason would be ''implicit regularization'' by SGD.
+Belkin ''et al.''
 ==Misc==