Deep Learning: Difference between revisions

1,315 bytes added ,  10 September 2020
Line 230: Line 230:
* Simon Du ''et al.<ref name="du2019gradient"></ref>
* Simon Du ''et al.<ref name="du2019gradient"></ref>


===Start of Lecture 4 (September 10)===
===Start of Lecture 4 (September 10, 2020)===
This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>.
This lecture is about Soudry ''et al.''<ref name="soudry2018implicit"></ref>.


Line 284: Line 284:
This implies \(\Vert w(t) \Vert\) goes to infinity and the loss converges to 0.   
This implies \(\Vert w(t) \Vert\) goes to infinity and the loss converges to 0.   
Since the loss converges to 0, GD converges to a global min.
Since the loss converges to 0, GD converges to a global min.
Assumption 3: Loss function has exponential tails. 
Exp and logistic loss satisfy this assumption.
;Theorem
Under assumptions 1-3, GD behaves as:
<math>w(t) = w_{svm} \log(t) + \rho(t)</math> where <math>\Vert \rho(t) \Vert = o(\log \log t)</math>. 
This implies <math>\lim_{t \to \infty} \frac{w(t)}{\Vert w(t)} = \frac{w_{svm}}{\Vert w_{svm}}</math>.
{{ hidden | Proof |
Consider exponential loss: 
<math>l(u) = e^{-u}</math> 
GD in the asymptotic regime: <math>w(t)^t x_i \to \infty</math>. 
Lets assume the normalized w converges to a limit point <math>w_{\infty}</math>:
<math>\frac{w(t)}{\Vert w(i) \Vert} \to w_{\infty}</math>. 
This implies <math>w(t) = g(t) w_{\infty} + \rho(t)</math>.
<math>
\begin{aligned}
-\nabla L(w) &= \sum_{i=1}^{N} \exp(-w(t)^t x_i) x_i\\
&\to \sum_{i=1}^{N} \exp(-g(t)w_{\infty}^t x_i) x_i\\
\end{aligned}
</math>
Here <math>-g(t) w_{\infty}^t x_i</math> go to negative infinity since <math>g(t) \to \infty</math>. 
Only samples with smallest <math>w_{\infty}^t x_i</math> will contribute to the gradient: 
<math>\operatorname{argmin}_{i} w_{\infty}^t x_i</math> which are exactly the support vectors. 
Thus: 
<math>
\begin{aligned}
&-\nabla L(w) = \sum_{i \in SV} \alpha_i' x_i\\
\implies &w_{\infty} = \sum_{i \in SV} \alpha_i'' x_i
\end{aligned}
</math>
}}


==Misc==
==Misc==