5,321
edits
Line 157: | Line 157: | ||
A model <math>f</math> parameterized by <math>\theta</math> is said to shatter a set of points <math>\{x_1, ..., x_n\}</math> if there exists <math>\theta</math> such that <math>f</math> makes no errors. | A model <math>f</math> parameterized by <math>\theta</math> is said to shatter a set of points <math>\{x_1, ..., x_n\}</math> if there exists <math>\theta</math> such that <math>f</math> makes no errors. | ||
====Definition==== | ====Definition==== | ||
Intuitively, the VC dimension of a hypothesis set is how complex of a model it is.<br> | |||
Stolen from wikipedia:<br> | Stolen from wikipedia:<br> | ||
The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them. | The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them. | ||
Line 180: | Line 181: | ||
* Larger hypothesis class will get smaller bias but larger variance. | * Larger hypothesis class will get smaller bias but larger variance. | ||
* Overfitting vs. underfitting | * Overfitting vs. underfitting | ||
===Rademacher Complexity=== | |||
====Definition==== | |||
The Rademacher complexity, like the VC dimension, measures how "rich" the hypothesis space is.<br> | |||
Given a set <math>A \subset \mathbb{R}^m</math> the Rademacher complexity is:<br> | |||
<math>R(A) = \frac{1}{m}E_{\sigma} [\sup_{a \in A} \sum_{i=1}^{m} \sigma_i a_i]</math><br> | |||
where each <math>\sigma_i</math> are from a discrete uniform distribution <math>\{-1, 1\}</math> | |||
Given a sample <math>S=\{z_1,...,z_n\}</math> and a function class <math>F</math>, the empirical rademacher complexity is:<br> | |||
<math>R(F \circ S)</math><br> | |||
where <math>F \circ S = \{(f(z_1),...,f(z_n)) \mid f \in F\}</math><br> |