Machine Learning: Difference between revisions

Machine Learning (view source)

733 bytes added , 10 November 2019

5,321

edits

@@ Line 157: / Line 157: @@
 A model <math>f</math> parameterized by <math>\theta</math> is said to shatter a set of points <math>\{x_1, ..., x_n\}</math> if there exists <math>\theta</math> such that <math>f</math> makes no errors.
 ====Definition====
+Intuitively, the VC dimension of a hypothesis set is how complex of a model it is.<br>
 Stolen from wikipedia:<br>
 The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them.
@@ Line 180: / Line 181: @@
 * Larger hypothesis class will get smaller bias but larger variance.
 * Overfitting vs. underfitting
+===Rademacher Complexity===
+====Definition====
+The Rademacher complexity, like the VC dimension, measures how "rich" the hypothesis space is.<br>
+Given a set <math>A \subset \mathbb{R}^m</math> the Rademacher complexity is:<br>
+<math>R(A) = \frac{1}{m}E_{\sigma} [\sup_{a \in A} \sum_{i=1}^{m} \sigma_i a_i]</math><br>
+where each <math>\sigma_i</math> are from a discrete uniform distribution <math>\{-1, 1\}</math>
+Given a sample <math>S=\{z_1,...,z_n\}</math> and a function class <math>F</math>, the empirical rademacher complexity is:<br>
+<math>R(F \circ S)</math><br>
+where <math>F \circ S = \{(f(z_1),...,f(z_n)) \mid f \in F\}</math><br>