5,337
edits
Line 237: | Line 237: | ||
{{reflist|refs= | {{reflist|refs= | ||
<ref name="liu2020towards">Chaoyue Liu, Libin Zhu, Mikhail Belkin (2020). Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning [https://arxiv.org/abs/2003.00307 https://arxiv.org/abs/2003.00307]</ref> | <ref name="liu2020towards">Chaoyue Liu, Libin Zhu, Mikhail Belkin (2020). Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning [https://arxiv.org/abs/2003.00307 https://arxiv.org/abs/2003.00307]</ref> | ||
<ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054]</ref> | <ref name="du2019gradient">Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2019). Gradient Descent Provably Optimizes Over-parameterized Neural Networks (ICLR 2019) [https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1810.02054]</ref> | ||
}} | }} |