5,321
edits
(→E-Step) |
|||
Line 77: | Line 77: | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
\implies \log\left[E_{Q}\left(\frac{ | \implies \log\left[E_{Q}\left(\frac{P(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}_{(j)}}\right)\right] \geq E_{Q} \left[ \log \left(\frac{P(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}(j)} \right)\right] | ||
</math><br> | </math><br> | ||
Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{ | Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | ||
The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step. | The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step. | ||
=====E-Step===== | =====E-Step===== | ||
Line 88: | Line 88: | ||
This implies <math>Q^{(i)}(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br> | This implies <math>Q^{(i)}(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br> | ||
Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br> | Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br> | ||
Q is updated with <math>Q^{(i)}(j) = \frac{ | Q is updated with <math>Q^{(i)}(j) = \frac{P(z^{(i)}=j)P(x^{(i)}|z^{(i)}=j)}{P(x^{(i)})}</math> | ||
{{hidden | Maximization w.r.t Q | | {{hidden | Maximization w.r.t Q | | ||
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{ | <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | ||
We are assuming that Q is a pmf so <math>\sum_j Q = 1 </math><br> | We are assuming that Q is a pmf so <math>\sum_j Q = 1 </math><br> | ||
Our lagrangian is:<br> | Our lagrangian is:<br> | ||
<math> | <math> | ||
\max_{Q} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{ | \max_{Q} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1)\right] | ||
</math> | </math> | ||
We can maximize each <math>Q^{(i)}</math> independently.<br> | We can maximize each <math>Q^{(i)}</math> independently.<br> | ||
Taking the derivative wrt Q we get:<br> | Taking the derivative wrt Q we get:<br> | ||
<math> | <math> | ||
\frac{\partial}{\partial Q^{(i)}_{(j)}} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{ | \frac{\partial}{\partial Q^{(i)}_{(j)}} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1) | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
= \log(\frac{ | = \log(\frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q}) - Q \frac{Q}{P(x^{(i)}, z^{(i)}=j;\theta)} (P(x^{(i)}, z^{(i)}=j;\theta))(Q^{-2}) + \beta | ||
</math> | </math> | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
= \log(\frac{ | = \log(\frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q}) - 1 + \beta = 0 | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
\implies Q^{(i)}_{(j)} = (\frac{1}{exp(1-\beta)}) | \implies Q^{(i)}_{(j)} = (\frac{1}{exp(1-\beta)})P(x^{(i)}, z^{(i)}=j;\theta) | ||
</math><br> | </math><br> | ||
Since Q is a pmf, we know it sums to 1 so we get the same result replacing <math>(\frac{1}{exp(1-\beta)})</math> with <math> | Since Q is a pmf, we know it sums to 1 so we get the same result replacing <math>(\frac{1}{exp(1-\beta)})</math> with <math>P(x^{(i)}</math> | ||
}} | }} | ||
=====M-Step===== | =====M-Step===== | ||
We will fix <math>Q</math> and maximize J wrt <math>\theta</math> | We will fix <math>Q</math> and maximize J wrt <math>\theta</math> | ||
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{ | <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | ||
Assume <math>\Sigma_j = I</math> for simplicity.<br> | Assume <math>\Sigma_j = I</math> for simplicity.<br> | ||
Then<br> | Then<br> | ||
<math> | <math> | ||
J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{ | J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( | = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)}, z^{(i)}=j;\theta)) + C_1 | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( | = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j) P(z^{(i)}=j)) + C_1 | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( | = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j)) - Q^{(i)}_{(j)} \log( P(z^{(i)}=j)) + C_1 | ||
</math><br> | </math><br> | ||
<math> | <math> |