Machine Learning

From David's Wiki
Revision as of 12:12, 10 November 2019 by David (talk | contribs)
Jump to navigation Jump to search
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Cross Entropy

The cross entropy loss is

  • \(\displaystyle J(\theta) = \sum [(y^{(i)})\log(h_\theta(x)) + (1-y^{(i)})\log(1-h_\theta(x))]\)
Notes
  • If our model is \(\displaystyle g(\theta^Tx^{(i)})\) where \(\displaystyle g(x)\) is the sigmoid function \(\displaystyle \frac{e^x}{1+e^x}\) then this is convex
Proof

\(\displaystyle \begin{aligned} \nabla_\theta J(\theta) &= -\nabla_\theta \sum [(y^{(i)})\log(g(\theta^t x^{(i)})) + (1-y^{(i)})\log(1-g(\theta^t x^{(i)}))]\\ &= -\sum [(y^{(i)})\frac{g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{g(\theta^t x^{(i)})}x^{(i)} + (1-y^{(i)})\frac{-g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{1-g(\theta^t x^{(i)})}x^{(i)}]\\ &= -\sum [(y^{(i)})(1-g(\theta^t x^{(i)}))x^{(i)} - (1-y^{(i)})g(\theta^t x^{(i)})x^{(i)}]\\ &= -\sum [(y^{(i)})x^{(i)} -(y^{(i)}) g(\theta^t x^{(i)}))x^{(i)} - g(\theta^t x^{(i)})x^{(i)} + y^{(i)}g(\theta^t x^{(i)})x^{(i)}]\\ &= -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\ \implies \nabla^2_\theta J(\theta) &= \nabla_\theta -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}]\\ &= \sum g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)})) x^{(i)} (x^{(i)})^T\\ \end{aligned} \)
which is a PSD matrix