5,337
edits
No edit summary |
|||
Line 8: | Line 8: | ||
===Learning Rate=== | ===Learning Rate=== | ||
==Kernel Trick== | |||
Oftentimes, using linear classifiers such as perceptron and SVM may fail to classify data for which the true decision boundary is non-linear.<br> | |||
In this case, one way to get around this is to perform a non-linear preprocessing of the data <math>\phi(x)</math>.<br> | |||
For example <math>\phi(x) = \begin{bmatrix}x \\ x^2 \\ x^3\end{bmatrix}</math> | |||
If our original model and training only used <math>\langle x, z\rangle</math> then we only need <math>\phi(x)^T\phi(z)</math><br> | |||
A kernel <math>K(x,z)</math> is a function that can be expressed as <math>K(x,z)=\phi(x)^T\phi(z)</math> for some function <math>\phi</math><br> | |||
===Identifying if a function is a kernel=== | |||
Basic check: | |||
Since the kernel is an inner-product, it should satisfy the axioms of inner products, namely <math>K(x,z)=K(z,x)</math>, otherwise it is not a kernel.<br> | |||
====Mercer Conditions==== | |||
==Learning Theory== | ==Learning Theory== |