Deep Learning: Difference between revisions

Deep Learning (view source)

60 bytes removed , 17 September 2020

5,321

edits

@@ Line 446: / Line 446: @@
 For SGD, it is easier to find ''simple'' solutions (e.g. functions with small norms). This leads to better generalization.
-===Can we analyze the double descent curve for some simple distributions or models? (Belkin ''et al.''<ref name="belkin2019reconciling"></ref>)===
+===Can we analyze the double descent curve for some simple distributions or models?===
 Setup:
 Our features are <math>x = (x_1,..., x_d)</math> where <math>x_i</math> are from standard normal.