5,337
edits
No edit summary |
|||
Line 418: | Line 418: | ||
<math>H = \{ h(x) \mid h \text{ is a NN with some structure}\}</math> | <math>H = \{ h(x) \mid h \text{ is a NN with some structure}\}</math> | ||
If <math>R(H \circ S)</math> is small then by the theorem, we can have good generalization performance. | If <math>R(H \circ S)</math> is small then by the theorem, we can have good generalization performance. | ||
Zhang ''et al.''<ref name="zhang2017understanding"></ref> perform a randomization test. | |||
They assign random labels and observe that neural networks can fit random labels. | |||
Recall <math>R(H \circ S) = \frac{1}{n} E_{\sigma} \left[ \sup_{h \in H} \sum_{i=1}^{n} \sigma_i h(x_i) \right] \approx 1</math> | |||
This shows that Rademacher complexity and VC-dimension are not useful for explaining generalization for neural networks. | |||
;Theorem | |||
There exists a two-layer NN with Relu activations and <math>2n+d</math> parameters that can represent any function on a sample size <math>n</math> in d dimensions. | |||
==Misc== | ==Misc== | ||
Line 428: | Line 436: | ||
<ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1810.02054]</ref> | <ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1810.02054]</ref> | ||
<ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref> | <ref name="soudry2018implicit">Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro (2018) The Implicit Bias of Gradient Descent on Separable Data ''The Journal of Machine Learning Research'' 2018 [https://arxiv.org/abs/1710.10345 https://arxiv.org/abs/1710.10345]</ref> | ||
<ref name="zhang2017understanding">Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2017) Understanding deep learning requires rethinking generalization (ICLR 2017) [https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1611.03530]</ref> | |||
}} | }} |