Machine Learning: Difference between revisions
Line 20: | Line 20: | ||
===Uniform Convergence=== | ===Uniform Convergence=== | ||
If for all hypothesis <math>h</math>, <math>|L_S(h)-L_D(h)| \leq \epsilon</math>, then the training set <math>S</math> is called <math>\epsilon</math>-representative. | If for all hypothesis <math>h</math>, <math>|L_S(h)-L_D(h)| \leq \epsilon</math>, then the training set <math>S</math> is called <math>\epsilon</math>-representative.<br> | ||
Then | |||
<math> | |||
L_D(h_s) | |||
\leq L_S(h_S) + \epsilon / 2 | |||
\leq L_S(h_D) + \epsilon / 2 | |||
\leq L_D(h_D) + \epsilon | |||
</math>.<br> | |||
A hypothesis class <math>H</math> has uniform convergence if there exists <math>m^{UC}(\epsilon, \delta)</math> such that for every <math>\epsilon, \delta</math>, if we draw a sample <math>S</math> then with probability <math>1-\delta</math>, <math>S</math> is <math>\epsilon</math>-representative. |
Revision as of 17:11, 31 October 2019
Machine Learning Interesting
Hyperparameters
Batch Size
A medium post empirically evaluating the effect of batch_size
Learning Rate
Learning Theory
PAC Learning
Probably Approximately Correct (PAC)
A hypothesis class \(\displaystyle H\) is PAC learnable if given \(\displaystyle 0 \lt \epsilon, \delta \lt 1\), there is some function \(\displaystyle m(\epsilon, \delta)\) polynomial in \(\displaystyle 1/\epsilon, 1/\delta\) such that if we have a sample size \(\displaystyle \geq m(\epsilon, \delta)\) then with probability \(\displaystyle 1-\delta\) the hypothesis we will learn will have an average error \(\displaystyle \leq \epsilon\).
Uniform Convergence
If for all hypothesis \(\displaystyle h\), \(\displaystyle |L_S(h)-L_D(h)| \leq \epsilon\), then the training set \(\displaystyle S\) is called \(\displaystyle \epsilon\)-representative.
Then
\(\displaystyle
L_D(h_s)
\leq L_S(h_S) + \epsilon / 2
\leq L_S(h_D) + \epsilon / 2
\leq L_D(h_D) + \epsilon
\).
A hypothesis class \(\displaystyle H\) has uniform convergence if there exists \(\displaystyle m^{UC}(\epsilon, \delta)\) such that for every \(\displaystyle \epsilon, \delta\), if we draw a sample \(\displaystyle S\) then with probability \(\displaystyle 1-\delta\), \(\displaystyle S\) is \(\displaystyle \epsilon\)-representative.