Machine Learning: Difference between revisions

Line 27: Line 27:
===Learning Rate===
===Learning Rate===
==SVM==
==SVM==
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>
Support Vector Machine<br>
This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.<br>
<math>h_{w,b}(x) = g(W*x+b)</math> where <math>g(x) = I[x>=0]-I[x<0]</math> is the sign function.<br>
===Margins===
The margin denoted by <math>\gamma</math> is the distance between our classifier and the closest point.<br>
;Functional Margin
The margin corresponding to one example is:<br>
<math>\hat{\gamma}^{(i)} = y^{(i)}(w^Tx^{(i)}+b)</math>
The margin for our entire sample is the smallest margin per example.
;Geometric Margin
The geometric margin is the actual distance.<br>
<math>\hat{\gamma}^{(i)} = y^{(i)}((\frac{w}{\Vert w \Vert})^Tx^{(i)}+\frac{b}{|b|})</math><br>
* <math>\mathbf{w}</math> is the normal vector of our hyperplane so \frac{w}{\Vert w \Vert})^Tx^{(i)} is the length of the projection of x onto our normal vector.
: This is the distance to our hyperplane.
===Lagrangians===
===Kernel Trick===
===Kernel Trick===
Oftentimes, using linear classifiers such as perceptron and SVM may fail to classify data for which the true decision boundary is non-linear.<br>
Oftentimes, using linear classifiers such as perceptron and SVM may fail to classify data for which the true decision boundary is non-linear.<br>