Deep Learning: Difference between revisions
Line 188: | Line 188: | ||
===Why do neural networks satisfy the ''conditioning'' assumptions?=== | ===Why do neural networks satisfy the ''conditioning'' assumptions?=== | ||
Hessian Control | |||
We can show mu-PL by showing the smallest eigenvalue of the tangent kernel is bounded: <math>\lambda_{\min}(K(w_0)) \geq \mu</math> and <math>\sup_{B} \Vert H(F)\Vert</math>. | |||
The tangent kernel is <math>\nabla F(w) \nabla F(w)^T</math>. | |||
If hessian is bounded then gradients don't change too fast so if we are <math>\mu</math>-PL at the initialization then we are <math>\mu</math>-PL in a ball around the initialization. | |||
Suppose we have a NN: <math>x \in \mathbb{R} \to y</math>. | |||
<math>f(w, x) = \frac{1}{\sqrt{m}}\sum_{i=1}^{m} v_i \sigma(w_i, x)</math>. | |||
;Can we prove convergence of GD for this NN? | |||
==Misc== | ==Misc== |