Deep Learning: Difference between revisions
Line 546: | Line 546: | ||
;Kernel Trick: | ;Kernel Trick: | ||
We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>. | We may have a closed form solution for <math>\langle \phi(x_i), \phi(x_j) \rangle</math>.<br> | ||
This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>. | This is called the kernel function <math>K(x_i, x_j)</math> or kernel matrix <math>K \in \mathbb{R}^{n \times n}</math>.<br> | ||
K is a PSD matrix. | <math display="inline">K</math> is a PSD matrix. | ||
Idea: In many cases without "explicit" comp of <math>\phi(x_i)</math>, we can compute <math>K(x_i, x_j)</math>. | Idea: In many cases without "explicit" comp of <math>\phi(x_i)</math>, we can compute <math>K(x_i, x_j)</math>. | ||
;Polynomial Kernels | ;Polynomial Kernels | ||
<math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math> | <math>K(x_i, x_j) = (x + x_i^t x_j)^k</math> with <math>\phi(x_i) \in \mathbb{R}^D</math><br> | ||
Here <math>D=O(d^k)</math> but <math>K(x_i, x_j)</math> is <math>O(d)</math>. | Here <math>D=O(d^k)</math> but <math>K(x_i, x_j)</math> is <math>O(d)</math>. | ||
Many classical techniques can be ''kernelized'': | Many classical techniques can be ''kernelized'': | ||
SVM to Kernel SVM | * SVM to Kernel SVM | ||
Ridge regression to Kernel ridge regression | * Ridge regression to Kernel ridge regression | ||
PCA to Kernel PCA | * PCA to Kernel PCA | ||
===Neural Networks=== | ===Neural Networks=== |