Deep Learning: Difference between revisions

Line 546: Line 546:


;Kernel Trick:
;Kernel Trick:
We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>.
We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>.<br>
This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>.
This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>.<br>
K is a PSD matrix.
<math display="inline">K</math> is a PSD matrix.


Idea: In many cases without "explicit" comp of <math>\phi(x_i)</math>, we can compute <math>K(x_i, x_j)</math>.
Idea: In many cases without "explicit" comp of <math>\phi(x_i)</math>, we can compute <math>K(x_i, x_j)</math>.


;Polynomial Kernels
;Polynomial Kernels
<math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math>
<math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math><br>
Here <math>D=O(d^k)</math> but <math>K(x_i, x_j)</math> is <math>O(d)</math>.
Here <math>D=O(d^k)</math> but <math>K(x_i, x_j)</math> is <math>O(d)</math>.


Many classical techniques can be ''kernelized'':
Many classical techniques can be ''kernelized'':
SVM to Kernel SVM
* SVM to Kernel SVM
Ridge regression to Kernel ridge regression
* Ridge regression to Kernel ridge regression
PCA to Kernel PCA
* PCA to Kernel PCA


===Neural Networks===
===Neural Networks===