Deep Learning: Difference between revisions

Line 188: Line 188:


===Why do neural networks satisfy the ''conditioning'' assumptions?===
===Why do neural networks satisfy the ''conditioning'' assumptions?===
Hessian Control 
We can show mu-PL by showing the smallest eigenvalue of the tangent kernel is bounded: <math>\lambda_{\min}(K(w_0)) \geq \mu</math> and <math>\sup_{B} \Vert H(F)\Vert</math>.
The tangent kernel is <math>\nabla F(w) \nabla F(w)^T</math>.
If hessian is bounded then gradients don't change too fast so if we are <math>\mu</math>-PL at the initialization then we are <math>\mu</math>-PL in a ball around the initialization.
Suppose we have a NN: <math>x \in \mathbb{R} \to y</math>. 
<math>f(w, x) = \frac{1}{\sqrt{m}}\sum_{i=1}^{m} v_i \sigma(w_i, x)</math>. 
;Can we prove convergence of GD for this NN?


==Misc==
==Misc==