5,337
edits
Line 89: | Line 89: | ||
;Lemma | ;Lemma | ||
If \(\lambda_{\min} K(w) \geq \mu \implies \mu\text{-PL}\) on \(B\). | If \(\lambda_{\min} (K(w)) \geq \mu \implies \mu\text{-PL}\) on \(B\). | ||
{{ hidden | Proof | | |||
<math> | |||
\begin{aligned} | |||
\frac{1}{2}\Vert \nabla f(w) \Vert^2 &= \frac{1}{2}\Vert (F(w)-y)^T \nabla F(w)\Vert^2\\ | |||
&=\frac{1}{2}(F(w)=y)^T \nabla F(w) \nabla F(w)^T (F(w)-y)\\ | |||
&\geq \frac{1}{2} \lambda_{\min}(K(w)) \Vert F(w)-y\Vert ^2\\ | |||
&= \lambda_{\min}(K(w)) L(w) | |||
\end{aligned} | |||
</math> | |||
}} | |||
Informal convergence result: | |||
GD converges exp fast to a solution with the rate controlled by a condition number: | |||
\[ \kappa_{F} = \frac{\sup_{B} \lambda_{\max}(H)}{\inf_{B} \lambda_{\min}(K)}\] | |||
Ideally we want this to be small. | |||
{{ hidden | Example | | |||
<math>F(w) = Aw = y \in \mathbb{R}^n</math> | |||
<math>L(w) = \frac{1}{2} \Vert Aw-y\Vert^2</math> | |||
Our tangent kernel is: | |||
<math>K(w)=AA^T</math> | |||
and our hessian is <math>H=A^TA</math>. | |||
<math>\kappa_{F} = \frac{\lambda_{\max}(K)}{\lambda_{\min}(K)} \geq 1</math>. | |||
Our intuition is: | |||
<math>K=AA^T \implies E[\log K_F] \approx \log(\frac{M}{|m-n|+1})</math>. | |||
So as <math>m \to \infty</math>, <math>\kappa_{F} \to 1</math>. | |||
}} | |||
{{hidden | Convergence proof: (Theorem 4.2 <ref name="liu2020towards"></ref>) | | |||
Local PL implies existence of a solution + fast convergence. | |||
Assume <math>L(w)</math> is <math display="inline">\beta</math>-smooth and satisfies <math display="inline">\mu</math>-PL condition. | |||
Assume these are true around a ball <math>B(w_0, R)</math> with <math>R = \frac{2\sqrt{w\beta L(w_0)}{\mu}</math>. | |||
We want to prove: | |||
* A solution exists. | |||
* <math>L(w_t) \leq (1-\eta \mu)^t L(w_0)</math> | |||
Note that <math>\eta \mu</math> is the inverse of the condition number. | |||
}} | |||
==Misc== | ==Misc== |