Deep Learning: Difference between revisions

No change in size ,  9 September 2020
Line 79: Line 79:


Instead of convexity, we use PL-condition (Polyak-Lojasiewicz, 1963):   
Instead of convexity, we use PL-condition (Polyak-Lojasiewicz, 1963):   
For <math>w \in B</math>, <math>\frac{1}{2}\Vert \nabla L(w) \Vert^2 \leq \mu L(w)</math> which implies exponential (linear) convergence of GD.
For <math>w \in B</math>, <math>\frac{1}{2}\Vert \nabla L(w) \Vert^2 \geq \mu L(w)</math> which implies exponential (linear) convergence of GD.


===Tangent Kernels===
===Tangent Kernels===