5,337
edits
No edit summary |
|||
Line 6: | Line 6: | ||
===Batch Size=== | ===Batch Size=== | ||
[https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e A medium post empirically evaluating the effect of batch_size] | [https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e A medium post empirically evaluating the effect of batch_size] | ||
==Loss functions== | |||
===(Mean) Squared Error=== | |||
The squared error is:<br> | |||
<math>J(\theta) = \sum|h_{\theta}(x^{(i)}) - y^(i)|^2</math><br> | |||
If our model is linear regression <math>h(x)=w^tx</math> then this is convex.<br> | |||
{{hidden|Proof| | |||
<math> | |||
\begin{aligned} | |||
\nabla_{w} J(w) &= \nabla \sum(w^tx^{(i)} - y^{(i)})^2\\ | |||
&= 2\sum(w^tx^{(i)} - y^(i))x \\ | |||
\implies \nabla_{w}^2 J(w) &= \nabla 2\sum(w^Tx^{(i)}-y^{(i)})x^{(i)}\\ | |||
&= 2\sumx^{(i)}x^{(i)}^T | |||
\end{aligned} | |||
so the hessian is positive semi-definite | |||
</math> | |||
}} | |||
===Cross Entropy=== | |||
===Hinge Loss=== | |||
==Optimization== | ==Optimization== |