5,321
edits
Line 410: | Line 410: | ||
<math>\implies G \approx 2 R(F \circ S)</math> | <math>\implies G \approx 2 R(F \circ S)</math> | ||
===Theorem=== | |||
With probability <math>1 - \delta</math>, | With probability <math>1 - \delta</math>, | ||
<math>L_D(h) - L_{S}(h) \leq 2 R(F \circ S) + c \sqrt{\frac{\log(4/\delta)}{n}}</math> | <math>L_D(h) - L_{S}(h) \leq 2 R(F \circ S) + c \sqrt{\frac{\log(4/\delta)}{n}}</math> | ||
Line 426: | Line 426: | ||
This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks. | This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks. | ||
===Theorem=== | |||
There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions. | There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions. | ||