Deep Learning: Difference between revisions

Line 546:

;Kernel Trick:

We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>.

We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>.<br>

This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>.

This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>.<br>

K is a PSD matrix.

<math display="inline">K</math> is a PSD matrix.

Idea: In many cases without "explicit" comp of <math>\phi(x_i)</math>, we can compute <math>K(x_i, x_j)</math>.

;Polynomial Kernels

<math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math>

<math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math><br>

Here <math>D=O(d^k)</math> but <math>K(x_i, x_j)</math> is <math>O(d)</math>.

Many classical techniques can be ''kernelized'':

SVM to Kernel SVM

* SVM to Kernel SVM

Ridge regression to Kernel ridge regression

* Ridge regression to Kernel ridge regression

PCA to Kernel PCA

* PCA to Kernel PCA

===Neural Networks===