Machine Learning: Difference between revisions

← Older edit Newer edit →

Revision as of 17:11, 31 October 2019

Machine Learning Interesting

Hyperparameters

Batch Size

A medium post empirically evaluating the effect of batch_size

Learning Rate

Learning Theory

PAC Learning

Probably Approximately Correct (PAC)
A hypothesis class \(\displaystyle H\) is PAC learnable if given \(\displaystyle 0 \lt \epsilon, \delta \lt 1\), there is some function \(\displaystyle m(\epsilon, \delta)\) polynomial in \(\displaystyle 1/\epsilon, 1/\delta\) such that if we have a sample size \(\displaystyle \geq m(\epsilon, \delta)\) then with probability \(\displaystyle 1-\delta\) the hypothesis we will learn will have an average error \(\displaystyle \leq \epsilon\).

Uniform Convergence

If for all hypothesis \(\displaystyle h\), \(\displaystyle |L_S(h)-L_D(h)| \leq \epsilon\), then the training set \(\displaystyle S\) is called \(\displaystyle \epsilon\)-representative.
Then \(\displaystyle L_D(h_s) \leq L_S(h_S) + \epsilon / 2 \leq L_S(h_D) + \epsilon / 2 \leq L_D(h_D) + \epsilon \).
A hypothesis class \(\displaystyle H\) has uniform convergence if there exists \(\displaystyle m^{UC}(\epsilon, \delta)\) such that for every \(\displaystyle \epsilon, \delta\), if we draw a sample \(\displaystyle S\) then with probability \(\displaystyle 1-\delta\), \(\displaystyle S\) is \(\displaystyle \epsilon\)-representative.

@@ Line 20: / Line 20: @@
 ===Uniform Convergence===
-If for all hypothesis <math>h</math>, <math>|L_S(h)-L_D(h)| \leq \epsilon</math>, then the training set <math>S</math> is called <math>\epsilon</math>-representative.
+If for all hypothesis <math>h</math>, <math>|L_S(h)-L_D(h)| \leq \epsilon</math>, then the training set <math>S</math> is called <math>\epsilon</math>-representative.<br>
+Then
+<math>
+L_D(h_s)
+\leq L_S(h_S) + \epsilon / 2
+\leq L_S(h_D) + \epsilon / 2
+\leq L_D(h_D) + \epsilon
+</math>.<br>
+A hypothesis class <math>H</math> has uniform convergence if there exists <math>m^{UC}(\epsilon, \delta)</math> such that for every <math>\epsilon, \delta</math>, if we draw a sample <math>S</math> then with probability <math>1-\delta</math>, <math>S</math> is <math>\epsilon</math>-representative.