5,337
edits
No edit summary |
|||
Line 223: | Line 223: | ||
GD converges even though our model does not go to a linear model. | GD converges even though our model does not go to a linear model. | ||
==Take-away== | ===Take-away=== | ||
Over-parameterization does not lead to linearization. | Over-parameterization does not lead to linearization. | ||
Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD. | Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD. | ||
Line 229: | Line 229: | ||
Other papers: | Other papers: | ||
* Simon Du ''et al.<ref name="du2019gradient"></ref> | * Simon Du ''et al.<ref name="du2019gradient"></ref> | ||
===Start of Lecture 4 (September 10)=== | |||
This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>. | |||
Setup: | |||
* Binary classification | |||
* Data is linearly separable | |||
* No bias term (b=0) | |||
==Misc== | ==Misc== |