5,321
edits
Line 77: | Line 77: | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
\implies \log\left[E_{Q}\left(\frac{Pr(x^{(i)}, | \implies \log\left[E_{Q}\left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}_{(j)}}\right)\right] \geq E_{Q} \left[ \log \left(\frac{Pr(x^{(i)}, q^{(i)}; \theta)}{Q^{(i)}(j)} \right)\right] | ||
</math><br> | </math><br> | ||
Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | |||
The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step. | |||
; E-Step | ; E-Step | ||
We will fix <math>\theta</math> and | We will fix <math>\theta</math> and maximize J wrt <math>Q</math>.<br> | ||
Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br> | Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br> | ||
We will assume <math>\frac{P(x^{(i)}, z^{(i)}=j; \theta) | We will assume <math>\frac{P(x^{(i)}, z^{(i)}=j; \theta) | ||
Line 86: | Line 88: | ||
This implies <math>Q^{(i)}(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br> | This implies <math>Q^{(i)}(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br> | ||
Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br> | Since Q is a pmf, we have <math>Q^{(i)}(j) = \frac{1}{P(x^{(i)})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math><br> | ||
Q is updated with <math>Q^{(i)}(j) = \frac{Pr(z^{(i)}=j)Pr(x^{(i)}|z^{(i)}=j)}{Pr(x^{(i)})}</math> | |||
; M-Step | ; M-Step | ||
We will fix <math>Q</math> and | We will fix <math>Q</math> and maximize J wrt <math>\theta</math> | ||
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | |||
Assume <math>\Sigma_j = I</math> for simplicity.<br> | |||
Then<br> | |||
<math> | |||
J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) | |||
</math><br> | |||
<math> | |||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( Pr(x^{(i)}, z^{(i)}=j;\theta)) + C_1 | |||
</math><br> | |||
<math> | |||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( Pr(x^{(i)} \mid z^{(i)}=j) Pr(z^{(i)}=j)) + C_1 | |||
</math><br> | |||
<math> | |||
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( Pr(x^{(i)} \mid z^{(i)}=j)) - Q^{(i)}_{(j)} \log( Pr(z^{(i)}=j)) + C_1 | |||
</math><br> | |||
<math> | |||
= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( (2\pi)^{-n/2}exp(-\Vert x^{(i)} + \mu_j \Vert^2 / 2)) - Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_1 | |||
</math><br> | |||
<math> | |||
= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} -\Vert x^{(i)} - \mu_j \Vert^2 / 2) + Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_2 | |||
</math><br> | |||
Maximizing wrt <math>\mu</math>, we get <math>\mu_j^* = (\sum_{i} Q^{(i)}_{(i)}x^{(i)}) / (\sum_{i}Q^{(i)}_{(j)})</math>.<br> | |||
Maximizing wrt <math>\phi</math> we get <math>\phi_j^* = \frac{1}{m} \sum_{i=1}^{m}Q^{(i)}_{(j)}</math><br> | |||
{{hidden | Maximization wrt phi| | |||
We want to maximize <math>Q^{(i)}_{(j)} \log( \phi_j)</math> subject to the condition <math>\sum \phi_j = 1</math> since <math>\phi_j = P(z^{(i)}=j)</math> is a pmf.<br> | |||
The lagrangian for this is <math>\max_{\phi_1,...,\phi_k} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k}Q^{(i)}_{(j)} \log( \phi_j) + \beta(\sum_{j=1}^{k}\phi_j - 1) \right]</math><br> | |||
This can be rewritten as <math>\max_{\phi_1,...,\phi_k} \min_{\beta} \sum_{j=1}^{k}\left[\log( \phi_j) \sum_{i=1}^{m} Q^{(i)}_{(j)} + \beta(\phi_j - 1/k) \right]</math><br> | |||
The dual of this problem is <math> \min_{\beta} \sum_{j=1}^{k} \max_{\phi_1,...,\phi_k} \left[\log( \phi_j) \sum_{i=1}^{m} Q^{(i)}_{(j)} + \beta(\phi_j - 1/k) \right]</math><br> | |||
Taking the gradient w.r.t <math>\phi</math>, we get <math>\frac{1}{\phi_j}\sum_{i}Q^{(i)}_{(j)} + \beta = 0 \implies \phi_j = \frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)})</math><br> | |||
Plugging this into our dual problem we get:<br> | |||
<math> \min_{\beta} \sum_{j=1}^{k} \left[\log(\frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)})) \sum_{i=1}^{m} Q^{(i)}_{(j)} + \beta(\frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)}) - 1/k) \right]</math><br> | |||
<math>= \min_{\beta} \sum_{j=1}^{k} \left[\log(\frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)})) \sum_{i=1}^{m} Q^{(i)}_{(j)} -(\sum_{i}Q^{(i)}_{(j)}) - (\beta/k) \right]</math><br> | |||
Taking the derivative with respect to <math>\beta</math>, we get:<br> | |||
<math> | |||
(\frac{1}{(-1/\beta)(\sum Q)})(-\sum Q)(-\beta^{-2}) - \frac{1}{k} = (\beta)(-\beta^{-2}) - \frac{1}{k} = \frac{-1}{\beta} - \frac{1}{k} = 0 | |||
</math><br> | |||
<math> | |||
\implies \beta = -k | |||
</math><br> | |||
Plugging in <math>\beta = -k</math> into our equation for <math>\phi_j</math> we get <math>\phi_j = \frac{1}{k}\sum_{i=1}^{m}Q^{(i)}_{(j)}</math> | |||
}} |