Deep Learning: Difference between revisions

no edit summary
No edit summary
Line 418: Line 418:
<math>H = \{ h(x) \mid h \text{ is a NN with some structure}\}</math>   
<math>H = \{ h(x) \mid h \text{ is a NN with some structure}\}</math>   
If <math>R(H \circ S)</math> is small then by the theorem, we can have good generalization performance.
If <math>R(H \circ S)</math> is small then by the theorem, we can have good generalization performance.
Zhang ''et al.''<ref name="zhang2017understanding"></ref> perform a randomization test.
They assign random labels and observe that neural networks can fit random labels. 
Recall <math>R(H \circ S) = \frac{1}{n} E_{\sigma} \left[ \sup_{h \in H} \sum_{i=1}^{n} \sigma_i h(x_i) \right] \approx 1</math> 
This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks.
;Theorem
There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions.


==Misc==
==Misc==
Line 428: Line 436:
<ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1810.02054]</ref>
<ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1810.02054]</ref>
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref>
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref>
<ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref>
}}
}}