Deep Learning: Difference between revisions

No edit summary
Line 85: Line 85:
Suppose our model is \(F(w)=y\) where \(w \in \mathbb{R}^m\) and \(y \in \mathbb{R}^n\).   
Suppose our model is \(F(w)=y\) where \(w \in \mathbb{R}^m\) and \(y \in \mathbb{R}^n\).   
Then our tangent kernel is:   
Then our tangent kernel is:   
\(K(w) = \nabla F(w) \nabla F(w)^T \in \mathbb{R}^{n \times n}\) where \(\nabla F(w) \in \mathbb{R}^{n \times m}\)
\[K(w) = \nabla F(w) \nabla F(w)^T \in \mathbb{R}^{n \times n}\]
where \(\nabla F(w) \in \mathbb{R}^{n \times m}\)


;Lemma
;Lemma
If \(\lambda \min K(w) \geq \mu \implies \mu\text{-PL}\) on \(B\).
If \(\lambda_{\min} K(w) \geq \mu \implies \mu\text{-PL}\) on \(B\).


==Misc==
==Misc==