Machine Learning: Difference between revisions

(4 intermediate revisions by the same user not shown)

Line 1:

Machine Learning

Here are some selected topics from the ML classes CMSC422 and CMSC726 at UMD.

==Loss functions==

Line 21:

Line 23:

The cross entropy loss is

* <math>J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]</math>

;Notes

* This is the sum of the log probabilities of picking the correct class (i.e. p if y=1 or 1-p if y=0).

* If our model is <math>g(\theta^Tx^{(i)})</math> where <math>g(x)</math> is the sigmoid function <math>\frac{e^x}{1+e^x}</math> then this is convex

Line 89:

Line 95:

==SVM==

[~~http~~://~~cs229~~.stanford.edu/~~notes~~/cs229-notes3.pdf Andrew Ng Notes]

[https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf Andrew Ng Notes]

Support Vector Machine

This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.

Line 266:

Line 272:

===Bias-Variance Tradeoff===

* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>~~L_S~~(h)</math> be the empirical loss of hypothesis h

* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>L_s(h)</math> be the empirical loss of hypothesis <math>h</math>.

** Here D is the true distribution and s is the training sample.

* <math>L_D(h_s^*) = L_D(h_D^*) + [L_D(h_s^*) - L_D(h_D^*)]</math>

* The term <math>L_D(h_D^*)</math> is called the bias

Line 289:

Line 296:

Let <math>X_1,...,X_n</math> be bounded in (a,b)

Then <math>P(|\bar{X}-E[\bar{X}]| \geq t) \leq 2\exp(-\frac{2nt^2}{(b-a)^2})</math>

==See Also==

* [[Supervised Learning]]

* [[Unsupervised Learning]]

* [[Deep Learning]]

@@ Line 1: / Line 1: @@
 Machine Learning
+Here are some selected topics from the ML classes CMSC422 and CMSC726 at UMD.
 ==Loss functions==
@@ Line 21: / Line 23: @@
 The cross entropy loss is
 * <math>J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]</math>
 ;Notes
+* This is the sum of the log probabilities of picking the correct class (i.e. p if y=1 or 1-p if y=0).
 * If our model is <math>g(\theta^Tx^{(i)})</math> where <math>g(x)</math> is the sigmoid function <math>\frac{e^x}{1+e^x}</math> then this is convex
@@ Line 89: / Line 95: @@
 ==SVM==
-[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>
+[https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf Andrew Ng Notes]<br>
 Support Vector Machine<br>
 This is a linear classifier the same as a perceptron except the goal is not to just classify our data properly but to also maximize the margin.<br>
@@ Line 266: / Line 272: @@
 ===Bias-Variance Tradeoff===
-* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>L_S(h)</math> be the empirical loss of hypothesis h
+* Let <math>L_D(h)</math> be the true loss of hypothesis <math>h</math> and <math>L_s(h)</math> be the empirical loss of hypothesis <math>h</math>.
+** Here D is the true distribution and s is the training sample.
 * <math>L_D(h_s^*) = L_D(h_D^*) + [L_D(h_s^*) - L_D(h_D^*)]</math>
 * The term <math>L_D(h_D^*)</math> is called the bias
@@ Line 289: / Line 296: @@
 Let <math>X_1,...,X_n</math> be bounded in (a,b)<br>
 Then <math>P(|\bar{X}-E[\bar{X}]| \geq t) \leq 2\exp(-\frac{2nt^2}{(b-a)^2})</math>
+==See Also==
+* [[Supervised Learning]]
+* [[Unsupervised Learning]]
+* [[Deep Learning]]