Deep Learning: Difference between revisions

Deep Learning (view source)

No change in size , 9 September 2020

5,337

edits

@@ Line 79: / Line 79: @@
 Instead of convexity, we use PL-condition (Polyak-Lojasiewicz, 1963):
-For <math>w \in B</math>, <math>\frac{1}{2}\Vert \nabla L(w) \Vert^2 \leq \mu L(w)</math> which implies exponential (linear) convergence of GD.
+For <math>w \in B</math>, <math>\frac{1}{2}\Vert \nabla L(w) \Vert^2 \geq \mu L(w)</math> which implies exponential (linear) convergence of GD.
 ===Tangent Kernels===