Machine Learning: Difference between revisions
Replaced content with "Machine Learning ==Loss functions== ===(Mean) Squared Error=== The squared error is:<br> <math>J(\theta) = \sum|h_{\theta}(x^{(i)}) - y^{(i)}|^2</math><br> If our model i..." Tag: Replaced |
No edit summary |
||
Line 1: | Line 1: | ||
== | ===Cross Entropy=== | ||
The cross entropy loss is | |||
The | * <math>J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]</math> | ||
<math>J(\theta) = \sum | ;Notes | ||
If our model is | * If our model is <math>g(\theta^Tx^{(i)})</math> where <math>g(x)</math> is the sigmoid function <math>\frac{e^x}{1+e^x}</math> then this is convex | ||
{{hidden|Proof| | {{hidden | Proof | | ||
<math> | <math> | ||
\begin{aligned} | \begin{aligned} | ||
\nabla_{ | \nabla_\theta J(\theta) &= -\nabla_\theta \sum [(y^{(i)})\log(g(\theta^t x^{(i)})) + (1-y^{(i)})\log(1-g(\theta^t x^{(i)}))]\\ | ||
&= | &= -\sum [(y^{(i)})\frac{g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{g(\theta^t x^{(i)})}x^{(i)} + (1-y^{(i)})\frac{-g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{1-g(\theta^t x^{(i)})}x^{(i)}]\\ | ||
\ | &= -\sum [(y^{(i)})(1-g(\theta^t x^{(i)}))x^{(i)} - (1-y^{(i)})g(\theta^t x^{(i)})x^{(i)}]\\ | ||
&= | &= -\sum [(y^{(i)})x^{(i)} -(y^{(i)}) g(\theta^t x^{(i)}))x^{(i)} - g(\theta^t x^{(i)})x^{(i)} + y^{(i)}g(\theta^t x^{(i)})x^{(i)}]\\ | ||
&= -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\ | |||
\implies \nabla^2_\theta J(\theta) &= \nabla_\theta -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\ | |||
&= \sum g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)})) x^{(i)} (x^{(i)})^T\\ | |||
\end{aligned} | \end{aligned} | ||
</math><br> | </math><br> | ||
which is a PSD matrix | |||
}} | }} |
Revision as of 16:12, 10 November 2019
Cross Entropy
The cross entropy loss is
- \(\displaystyle J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]\)
- Notes
- If our model is \(\displaystyle g(\theta^Tx^{(i)})\) where \(\displaystyle g(x)\) is the sigmoid function \(\displaystyle \frac{e^x}{1+e^x}\) then this is convex
\(\displaystyle
\begin{aligned}
\nabla_\theta J(\theta) &= -\nabla_\theta \sum [(y^{(i)})\log(g(\theta^t x^{(i)})) + (1-y^{(i)})\log(1-g(\theta^t x^{(i)}))]\\
&= -\sum [(y^{(i)})\frac{g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{g(\theta^t x^{(i)})}x^{(i)} + (1-y^{(i)})\frac{-g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{1-g(\theta^t x^{(i)})}x^{(i)}]\\
&= -\sum [(y^{(i)})(1-g(\theta^t x^{(i)}))x^{(i)} - (1-y^{(i)})g(\theta^t x^{(i)})x^{(i)}]\\
&= -\sum [(y^{(i)})x^{(i)} -(y^{(i)}) g(\theta^t x^{(i)}))x^{(i)} - g(\theta^t x^{(i)})x^{(i)} + y^{(i)}g(\theta^t x^{(i)})x^{(i)}]\\
&= -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\
\implies \nabla^2_\theta J(\theta) &= \nabla_\theta -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\
&= \sum g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)})) x^{(i)} (x^{(i)})^T\\
\end{aligned}
\)
which is a PSD matrix