Deep Learning: Difference between revisions

Line 171: Line 171:
Thus we see a geometric or exponential decrease in our loss function with convergence rate <math>(1-\eta \mu)</math>.
Thus we see a geometric or exponential decrease in our loss function with convergence rate <math>(1-\eta \mu)</math>.


If we don't have convergence, we're not sure if this means we're violating the <math>\mu</math>-PL condition.
{{hidden | Q&A |
It is possible one of the other assumptions is violated (e.g. if learning rate is too large).
If we don't have convergence, we're not sure if this means we're violating the <math display="inline">\mu</math>-PL condition.
It is possible one of the other assumptions is violated (e.g. if learning rate is too large).
These arguments should hold for ReLU networks since the non-differentiable points are measure 0 but it would require a more careful analysis.
}}


==Misc==
==Misc==