Deep Learning: Difference between revisions
No edit summary |
|||
Line 437: | Line 437: | ||
* Dropout | * Dropout | ||
[[File:Belkin2019reconciling fig1.png.png|500px|thumb|Figure 1 from Belkin et al. In the over-parameterized interpolation regime, more parameters leads to lower test errors. This is called ''double descent''.]] | |||
These types of explicit regularization improves generalization, but models still generalize well without them. | These types of explicit regularization improves generalization, but models still generalize well without them. | ||
One reason would be ''implicit regularization'' by SGD. | One reason would be ''implicit regularization'' by SGD. | ||
Belkin ''et al.'' | Belkin ''et al.''<ref name="belkin2019reconciling"></ref> observe that as models get more over-parameterized in the interpolation regime, test error will begin decreasing with the number of parameters. This is called ''double descent''. | ||
==Misc== | ==Misc== | ||
Line 452: | Line 453: | ||
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref> | <ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref> | ||
<ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref> | <ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref> | ||
<ref name="belkin2019reconciling">Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal (2019) Reconciling modern machine learning practice and the bias-variance trade-off (PNAS 2019) [https://arxiv.org/abs/1812.11118 https://arxiv.org/abs/1812.11118]</ref> | |||
}} | }} |