5,337
edits
Line 230: | Line 230: | ||
* Simon Du ''et al.<ref name="du2019gradient"></ref> | * Simon Du ''et al.<ref name="du2019gradient"></ref> | ||
===Start of Lecture 4 (September 10)=== | ===Start of Lecture 4 (September 10, 2020)=== | ||
This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>. | This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>. | ||
Line 284: | Line 284: | ||
This implies \(\Vert w(t) \Vert\) goes to infinity and the loss converges to 0. | This implies \(\Vert w(t) \Vert\) goes to infinity and the loss converges to 0. | ||
Since the loss converges to 0, GD converges to a global min. | Since the loss converges to 0, GD converges to a global min. | ||
Assumption 3: Loss function has exponential tails. | |||
Exp and logistic loss satisfy this assumption. | |||
;Theorem | |||
Under assumptions 1-3, GD behaves as: | |||
<math>w(t) = w_{svm} \log(t) + \rho(t)</math> where <math>\Vert \rho(t) \Vert = o(\log \log t)</math>. | |||
This implies <math>\lim_{t \to \infty} \frac{w(t)}{\Vert w(t)} = \frac{w_{svm}}{\Vert w_{svm}}</math>. | |||
{{ hidden | Proof | | |||
Consider exponential loss: | |||
<math>l(u) = e^{-u}</math> | |||
GD in the asymptotic regime: <math>w(t)^t x_i \to \infty</math>. | |||
Lets assume the normalized w converges to a limit point <math>w_{\infty}</math>: | |||
<math>\frac{w(t)}{\Vert w(i) \Vert} \to w_{\infty}</math>. | |||
This implies <math>w(t) = g(t) w_{\infty} + \rho(t)</math>. | |||
<math> | |||
\begin{aligned} | |||
-\nabla L(w) &= \sum_{i=1}^{N} \exp(-w(t)^t x_i) x_i\\ | |||
&\to \sum_{i=1}^{N} \exp(-g(t)w_{\infty}^t x_i) x_i\\ | |||
\end{aligned} | |||
</math> | |||
Here <math>-g(t) w_{\infty}^t x_i</math> go to negative infinity since <math>g(t) \to \infty</math>. | |||
Only samples with smallest <math>w_{\infty}^t x_i</math> will contribute to the gradient: | |||
<math>\operatorname{argmin}_{i} w_{\infty}^t x_i</math> which are exactly the support vectors. | |||
Thus: | |||
<math> | |||
\begin{aligned} | |||
&-\nabla L(w) = \sum_{i \in SV} \alpha_i' x_i\\ | |||
\implies &w_{\infty} = \sum_{i \in SV} \alpha_i'' x_i | |||
\end{aligned} | |||
</math> | |||
}} | |||
==Misc== | ==Misc== |