Deep Learning: Difference between revisions

Deep Learning (view source)

9 bytes added , 17 September 2020

5,321

edits

@@ Line 410: / Line 410: @@
 <math>\implies G \approx 2 R(F \circ S)</math>
-;Theorem
+===Theorem===
 With probability <math>1 - \delta</math>,
 <math>L_D(h) - L_{S}(h) \leq 2 R(F \circ S) + c \sqrt{\frac{\log(4/\delta)}{n}}</math>
@@ Line 426: / Line 426: @@
 This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks.
-;Theorem
+===Theorem===
 There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions.