Machine Learning: Difference between revisions

Line 157: Line 157:
A model <math>f</math> parameterized by <math>\theta</math> is said to shatter a set of points <math>\{x_1, ..., x_n\}</math> if there exists <math>\theta</math> such that <math>f</math> makes no errors.
A model <math>f</math> parameterized by <math>\theta</math> is said to shatter a set of points <math>\{x_1, ..., x_n\}</math> if there exists <math>\theta</math> such that <math>f</math> makes no errors.
====Definition====
====Definition====
Intuitively, the VC dimension of a hypothesis set is how complex of a model it is.<br>
Stolen from wikipedia:<br>
Stolen from wikipedia:<br>
The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them.
The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them.
Line 180: Line 181:
* Larger hypothesis class will get smaller bias but larger variance.
* Larger hypothesis class will get smaller bias but larger variance.
* Overfitting vs. underfitting
* Overfitting vs. underfitting
===Rademacher Complexity===
====Definition====
The Rademacher complexity, like the VC dimension, measures how "rich" the hypothesis space is.<br>
Given a set <math>A \subset \mathbb{R}^m</math> the Rademacher complexity is:<br>
<math>R(A) = \frac{1}{m}E_{\sigma} [\sup_{a \in A} \sum_{i=1}^{m} \sigma_i a_i]</math><br>
where each <math>\sigma_i</math> are from a discrete uniform distribution <math>\{-1, 1\}</math>
Given a sample <math>S=\{z_1,...,z_n\}</math> and a function class <math>F</math>, the empirical rademacher complexity is:<br>
<math>R(F \circ S)</math><br>
where <math>F \circ S = \{(f(z_1),...,f(z_n)) \mid f \in F\}</math><br>