Unsupervised Learning: Difference between revisions

Line 79: Line 79:
\implies \log\left[E_{Q}\left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}_{(j)}}\right)\right] \geq E_{Q} \left[ \log \left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}(j)} \right)\right]
\implies \log\left[E_{Q}\left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}_{(j)}}\right)\right] \geq E_{Q} \left[ \log \left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}(j)} \right)\right]
</math><br>
</math><br>
Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br>
Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br>
The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step.
The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step.
; E-Step
; E-Step
Line 89: Line 89:
Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br>
Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br>
Q is updated with <math>Q^{(i)}(j) = \frac{Pr(z^{(i)}=j)Pr(x^{(i)}|z^{(i)}=j)}{Pr(x^{(i)})}</math>
Q is updated with <math>Q^{(i)}(j) = \frac{Pr(z^{(i)}=j)Pr(x^{(i)}|z^{(i)}=j)}{Pr(x^{(i)})}</math>
{{hidden | Maximization w.r.t Q |
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br>
We are assuming that Q is a pmf so <math>\sum_j Q = 1 </math><br>
Our lagrangian is:<br>
<math>
\max_{Q} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1)\right]
</math>
}}
; M-Step
; M-Step
We will fix <math>Q</math> and maximize J wrt <math>\theta</math>
We will fix <math>Q</math> and maximize J wrt <math>\theta</math>