Jump to content

Machine Learning: Difference between revisions

no edit summary
No edit summary
Line 6: Line 6:
===Batch Size===
===Batch Size===
[https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e A medium post empirically evaluating the effect of batch_size]
[https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e A medium post empirically evaluating the effect of batch_size]
==Loss functions==
===(Mean) Squared Error===
The squared error is:<br>
<math>J(\theta) = \sum|h_{\theta}(x^{(i)}) - y^(i)|^2</math><br>
If our model is linear regression <math>h(x)=w^tx</math> then this is convex.<br>
{{hidden|Proof|
<math>
\begin{aligned}
\nabla_{w} J(w) &= \nabla \sum(w^tx^{(i)} - y^{(i)})^2\\
&= 2\sum(w^tx^{(i)} - y^(i))x \\
\implies \nabla_{w}^2 J(w) &= \nabla 2\sum(w^Tx^{(i)}-y^{(i)})x^{(i)}\\
&= 2\sumx^{(i)}x^{(i)}^T
\end{aligned}
so the hessian is positive semi-definite
</math>
}}
===Cross Entropy===
===Hinge Loss===


==Optimization==
==Optimization==