Machine Learning: Difference between revisions

 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
Machine Learning
Machine Learning
Here are some selected topics from the ML classes CMSC422 and CMSC726 at UMD.


==Loss functions==
==Loss functions==
Line 21: Line 23:
The cross entropy loss is
The cross entropy loss is
* <math>J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]</math>
* <math>J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]</math>
;Notes
;Notes
* This is the sum of the log probabilities of picking the correct class (i.e. p if y=1 or 1-p if y=0).
* If our model is <math>g(\theta^Tx^{(i)})</math> where <math>g(x)</math> is the sigmoid function <math>\frac{e^x}{1+e^x}</math> then this is convex
* If our model is <math>g(\theta^Tx^{(i)})</math> where <math>g(x)</math> is the sigmoid function <math>\frac{e^x}{1+e^x}</math> then this is convex


Line 89: Line 95:


==SVM==
==SVM==
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>
[https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf Andrew Ng Notes]<br>
Support Vector Machine<br>
Support Vector Machine<br>
This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.<br>
This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.<br>
Line 266: Line 272:


===Bias-Variance Tradeoff===
===Bias-Variance Tradeoff===
* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>L_S(h)</math> be the empirical loss of hypothesis h
* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>L_s(h)</math> be the empirical loss of hypothesis <math>h</math>.
** Here D is the true distribution and s is the training sample.
* <math>L_D(h_s^*) = L_D(h_D^*) + [L_D(h_s^*) - L_D(h_D^*)]</math>
* <math>L_D(h_s^*) = L_D(h_D^*) + [L_D(h_s^*) - L_D(h_D^*)]</math>
* The term <math>L_D(h_D^*)</math> is called the bias
* The term <math>L_D(h_D^*)</math> is called the bias
Line 289: Line 296:
Let <math>X_1,...,X_n</math> be bounded in (a,b)<br>
Let <math>X_1,...,X_n</math> be bounded in (a,b)<br>
Then <math>P(|\bar{X}-E[\bar{X}]| \geq t) \leq 2\exp(-\frac{2nt^2}{(b-a)^2})</math>
Then <math>P(|\bar{X}-E[\bar{X}]| \geq t) \leq 2\exp(-\frac{2nt^2}{(b-a)^2})</math>
==See Also==
* [[Supervised Learning]]
* [[Unsupervised Learning]]
* [[Deep Learning]]