Deep Learning: Difference between revisions

Line 1,266: Line 1,266:
Lemma: If linearly stable but <math>\rho(J(\theta^*)) < 1</math> then asymptotic stability.
Lemma: If linearly stable but <math>\rho(J(\theta^*)) < 1</math> then asymptotic stability.


;Def Strongly local min-max
====Strongly local min-max====
Definition: 
<math>
<math>
\begin{cases}
\begin{cases}
Line 1,272: Line 1,273:
\lambda_{max}(\nabla^2_{yy} f) < 0
\lambda_{max}(\nabla^2_{yy} f) < 0
\end{cases}
\end{cases}
</math>
Simultaneous GDA: 
<math>H =
\begin{pmatrix}
- \nabla_{xx}^2 f & -\nabla_{xy}^2 f\\
\nabla_{xy}^2 f & \nabla_{yy}^2 f\\
\end{pmatrix}
</math> 
Consider <math>\theta^*</math> is a local min-max. Then both of the diagonal matrices (<math>-\nabla^2_{xx}</math> and <math>\nabla^2_{yy} f</math>) will be negative semi definite.
Lemma:
Eigenvalues of the hessian matrix will not have a positive real part: <math>Re(\lambda(H)) < 0</math>. 
Why? 
<math>
\begin{pmatrix}
A & B\\
-B^T & C
\end{pmatrix}
\begin{pmatrix}
v \\ u
\end{pmatrix}
=
\lambda
\begin{pmatrix}
v \\ u
\end{pmatrix}
</math>
Summing up both results in: 
<math>
\begin{aligned}
&(v^H A v + u^H C u) + (v^H B u - u^H B^T v) = \lambda (\Vert v \Vert^2 + \Vert u \Vert^2)\\
\implies &Re(v^H A v + u^H C u) = Re(\lambda)(\Vert v \Vert^2 + \Vert u \Vert^2) < 0\\
\implies &Re(\lambda) < 0
\end{aligned}
</math>
</math>