Deep Learning: Difference between revisions

← Older edit Newer edit →

@@ Line 188: / Line 188: @@
 ===Why do neural networks satisfy the ''conditioning'' assumptions?===
+Hessian Control
+We can show mu-PL by showing the smallest eigenvalue of the tangent kernel is bounded: <math>\lambda_{\min}(K(w_0)) \geq \mu</math> and <math>\sup_{B} \Vert H(F)\Vert</math>.
+The tangent kernel is <math>\nabla F(w) \nabla F(w)^T</math>.
+If hessian is bounded then gradients don't change too fast so if we are <math>\mu</math>-PL at the initialization then we are <math>\mu</math>-PL in a ball around the initialization.
+Suppose we have a NN: <math>x \in \mathbb{R} \to y</math>.
+<math>f(w, x) = \frac{1}{\sqrt{m}}\sum_{i=1}^{m} v_i \sigma(w_i, x)</math>.
+;Can we prove convergence of GD for this NN?
 ==Misc==