Deep Learning: Difference between revisions

No edit summary
Line 223: Line 223:
GD converges even though our model does not go to a linear model.
GD converges even though our model does not go to a linear model.


==Take-away==
===Take-away===
Over-parameterization does not lead to linearization.  
Over-parameterization does not lead to linearization.  
Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD.
Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD.
Line 229: Line 229:
Other papers:
Other papers:
* Simon Du ''et al.<ref name="du2019gradient"></ref>
* Simon Du ''et al.<ref name="du2019gradient"></ref>
===Start of Lecture 4 (September 10)===
This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>.
Setup:
* Binary classification
* Data is linearly separable
* No bias term (b=0)


==Misc==
==Misc==