Deep Learning: Difference between revisions

no edit summary
No edit summary
Line 437: Line 437:
* Dropout
* Dropout


[[File:Belkin2019reconciling fig1.png.png|500px|thumb|Figure 1 from Belkin et al. In the over-parameterized interpolation regime, more parameters leads to lower test errors. This is called ''double descent''.]]
These types of explicit regularization improves generalization, but models still generalize well without them.   
These types of explicit regularization improves generalization, but models still generalize well without them.   
One reason would be ''implicit regularization'' by SGD.   
One reason would be ''implicit regularization'' by SGD.   


Belkin ''et al.''
Belkin ''et al.''<ref name="belkin2019reconciling"></ref> observe that as models get more over-parameterized in the interpolation regime, test error will begin decreasing with the number of parameters. This is called ''double descent''.


==Misc==
==Misc==
Line 452: Line 453:
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref>
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref>
<ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref>
<ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref>
<ref name="belkin2019reconciling">Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal (2019) Reconciling modern machine learning practice and the bias-variance trade-off (PNAS 2019) [https://arxiv.org/abs/1812.11118 https://arxiv.org/abs/1812.11118]</ref>
}}
}}