Deep Learning: Difference between revisions

Deep Learning (view source)

181 bytes added , 8 September 2020

5,337

edits

@@ Line 171: / Line 171: @@
 Thus we see a geometric or exponential decrease in our loss function with convergence rate <math>(1-\eta \mu)</math>.
-If we don't have convergence, we're not sure if this means we're violating the <math>\mu</math>-PL condition.
+{{hidden | Q&A |
-It is possible one of the other assumptions is violated (e.g. if learning rate is too large).
+If we don't have convergence, we're not sure if this means we're violating the <math display="inline">\mu</math>-PL condition.
+It is possible one of the other assumptions is violated (e.g. if learning rate is too large).
+These arguments should hold for ReLU networks since the non-differentiable points are measure 0 but it would require a more careful analysis.
+}}
 ==Misc==