5,337
edits
Line 223: | Line 223: | ||
GD converges even though our model does not go to a linear model. | GD converges even though our model does not go to a linear model. | ||
=== | ===Takeaway=== | ||
Over-parameterization does not lead to linearization. | Over-parameterization does not lead to linearization. | ||
Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD. | Over-parameterization leads to good conditioning which leads to PL and convergence of GD/SGD. |