5,337
edits
(→SVM) |
|||
Line 27: | Line 27: | ||
===Learning Rate=== | ===Learning Rate=== | ||
==SVM== | ==SVM== | ||
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br> | |||
Support Vector Machine<br> | |||
This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.<br> | |||
<math>h_{w,b}(x) = g(W*x+b)</math> where <math>g(x) = I[x>=0]-I[x<0]</math> is the sign function.<br> | |||
===Margins=== | |||
The margin denoted by <math>\gamma</math> is the distance between our classifier and the closest point.<br> | |||
;Functional Margin | |||
The margin corresponding to one example is:<br> | |||
<math>\hat{\gamma}^{(i)} = y^{(i)}(w^Tx^{(i)}+b)</math> | |||
The margin for our entire sample is the smallest margin per example. | |||
;Geometric Margin | |||
The geometric margin is the actual distance.<br> | |||
<math>\hat{\gamma}^{(i)} = y^{(i)}((\frac{w}{\Vert w \Vert})^Tx^{(i)}+\frac{b}{|b|})</math><br> | |||
* <math>\mathbf{w}</math> is the normal vector of our hyperplane so \frac{w}{\Vert w \Vert})^Tx^{(i)} is the length of the projection of x onto our normal vector. | |||
: This is the distance to our hyperplane. | |||
===Lagrangians=== | |||
===Kernel Trick=== | ===Kernel Trick=== | ||
Oftentimes, using linear classifiers such as perceptron and SVM may fail to classify data for which the true decision boundary is non-linear.<br> | Oftentimes, using linear classifiers such as perceptron and SVM may fail to classify data for which the true decision boundary is non-linear.<br> |