Deep Learning: Difference between revisions

Line 497: Line 497:
* Sharpness-based bounds (PAC-Bayesian)
* Sharpness-based bounds (PAC-Bayesian)
* Norm-based bounds
* Norm-based bounds
==Neural Tangent Kernels (NTKs)==
Beginning of Lecture 7 (Sept. 22, 2020)
===Linear Regression===
Assume we have a dataset: 
<math>\{(x_i, y_i)\}_{i=1}^{n}</math> 
<math>y_i \in \mathbb{R}</math> 
<math>x_i \in \mathbb{R}^d</math> 
<math>f(w, x) = w^t x</math>
<math>L(w) = \frac{1}{2} \sum_{i=1}^{n}(y_i - f(w, x_i))^2</math> 
<math>\min_{W} L(w)</math> 
GD: <math>w(t+1) = w(t) - \eta_{t} \nabla L(w_t)</math> where our gradient is: 
<math>\sum_{i=1}^{n}(y_i - f(w, x_i)) \nabla_{w} f(w_t, x_i) = \sum_{i=1}^{n}(y_i - f(w, x_i)) x_i</math>
===Kernel Method===
<math>x_i \in \mathbb{R}^d \to \phi(x_i) \in \mathbb{R}^{D}</math> with <math>D >> d</math>
Suppose <math>d=3</math> and <math>x =
\begin{bmatrix}
x_1\\x_2\\x_3
\end{bmatrix}
\to
\phi(x)=
\begin{bmatrix}
x_1\\
x_2\\
x_3\\
x_1 x_2\\
x_1 x_3\\
x_2 x_3
\end{bmatrix}
</math>
Is this model linear in w? Yes! 
Is this model linear in x? No!


==Misc==
==Misc==