5,321
edits
Line 60: | Line 60: | ||
Expectation Maximization<br> | Expectation Maximization<br> | ||
The key idea is to introduce intermediate coefficients to apply Jensen's inequality.<br> | The key idea is to introduce intermediate coefficients to apply Jensen's inequality.<br> | ||
=====Maximum Likelihood===== | |||
Let <math>\theta_j = [\phi_j, \mu_j, \sigma_j]</math> denote our parameters.<br> | Let <math>\theta_j = [\phi_j, \mu_j, \sigma_j]</math> denote our parameters.<br> | ||
The likelihood is <math>L(\theta) = \Pi_{i=1}^{m} P(x^{(i)}; \theta)</math> where <math>P</math> is our pdf/pmf.<br> | The likelihood is <math>L(\theta) = \Pi_{i=1}^{m} P(x^{(i)}; \theta)</math> where <math>P</math> is our pdf/pmf.<br> | ||
Line 81: | Line 81: | ||
Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | Let <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | ||
The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step. | The EM algorithm is an iterative algorithm which alternates between an E-Step and an M-Step. | ||
=====E-Step===== | |||
We will fix <math>\theta</math> and maximize J wrt <math>Q</math>.<br> | We will fix <math>\theta</math> and maximize J wrt <math>Q</math>.<br> | ||
Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br> | Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br> | ||
Line 95: | Line 95: | ||
<math> | <math> | ||
\max_{Q} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1)\right] | \max_{Q} \min_{\beta} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1)\right] | ||
</math> | |||
We can maximize each <math>Q^{(i)}</math> independently.<br> | |||
Taking the derivative wrt Q we get:<br> | |||
<math> | |||
\frac{\partial}{\partial Q^{(i)}_{(j)}} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right) +\beta (\sum_j Q^{(i)}_{(j)} - 1)</math><br> | |||
<math> | |||
= \log(\frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q}) - Q \frac{Q}{Pr(x^{(i)}, z^{(i)}=j;\theta)} (Pr(x^{(i)}, z^{(i)}=j;\theta))(Q^{-2}) + \beta | |||
</math> | </math> | ||
}} | }} | ||
=====M-Step===== | |||
We will fix <math>Q</math> and maximize J wrt <math>\theta</math> | We will fix <math>Q</math> and maximize J wrt <math>\theta</math> | ||
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> | <math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{Pr(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br> |