Deep Learning: Difference between revisions

Line 410: Line 410:
<math>\implies G \approx 2 R(F \circ S)</math>
<math>\implies G \approx 2 R(F \circ S)</math>


;Theorem
===Theorem===
With probability <math>1 - \delta</math>,   
With probability <math>1 - \delta</math>,   
<math>L_D(h) - L_{S}(h) \leq 2 R(F \circ S) + c \sqrt{\frac{\log(4/\delta)}{n}}</math>
<math>L_D(h) - L_{S}(h) \leq 2 R(F \circ S) + c \sqrt{\frac{\log(4/\delta)}{n}}</math>
Line 426: Line 426:
This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks.
This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks.


;Theorem  
===Theorem===
There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions.
There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions.