Deep Learning: Difference between revisions

Line 446: Line 446:
For SGD, it is easier to find ''simple'' solutions (e.g. functions with small norms). This leads to better generalization.
For SGD, it is easier to find ''simple'' solutions (e.g. functions with small norms). This leads to better generalization.


===Can we analyze the double descent curve for some simple distributions or models? (Belkin ''et al.''<ref name="belkin2019reconciling"></ref>)===
===Can we analyze the double descent curve for some simple distributions or models?===
Setup:   
Setup:   
Our features are <math>x = (x_1,..., x_d)</math> where <math>x_i</math> are from standard normal.   
Our features are <math>x = (x_1,..., x_d)</math> where <math>x_i</math> are from standard normal.