Unsupervised Learning: Difference between revisions

Line 52: Line 52:


===Soft K-means===
===Soft K-means===
We will develop a model for how our data is generated:<br>
We will develop a model for how our data is generated.<br>
We will then find probabilities for each element being from a cluster (ala Bayesian paradigm).<br>
Given <math>k</math> clusters, the probability of a point being from cluster k is <math>\phi_k = P(z^{(i)} = k)</math><br>
Given <math>k</math> clusters, the probability of a point being from cluster k is <math>\phi_k = P(z^{(i)} = k)</math><br>
We will assume each cluster is from a normal distribution <math>N(\mu_j, \sigma_j)</math>
====EM Algorithm====
Expectation Maximization<br>
The key idea is to introduce intermediate coefficients to apply Jensen's inequality.<br>
;Maximum Likelihood
Let <math>\theta_j = [\phi_j, \mu_j, \sigma_j]</math> denote our parameters.<br>
The likelihood is <math>L(\theta) = \Pi_{i=1}^{m} P(x^{(i)}; \theta)</math> where <math>P</math> is our pdf/pmf.<br>
Then the log-likelihood is <math>l(\theta) = \sum_{i=1}^{m} \log P(x^{(i)}; \theta) = \sum_{i=1}^{m} \log \sum_{j=1}^{k}P(x^{(i)}, z^{(i)}=j; \theta)</math><br>
By introducing an extra variable <math>Q^{(i)}_{(j)}</math> we'll be able to apply Jensen's inequality to the concave log function.<br>
Assume <math>Q^{(i)}{(j)}</math> is a probability mass function.<br>
<math>
l(\theta) = \sum_{i=1}^{m} \log \sum_{j=1}^{k}P(x^{(i)}, z^{(i)}=j; \theta)
</math><br>
<math>
=\sum_{i=1}^{m} \log \sum_{j=1}^{k} \frac{Q^{(i)}_{(j)}}{Q^{(i)}_{(j)}} P(x^{(i)}, z^{(i)}=j; \theta)
</math><br>
<math>
\geq \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log(\frac{P(x^{(i)}, z^{(i)}=j; \theta)
}{Q^{(i)}_{(j)}})
</math>
<math>
\implies
</math>
Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br>
Since log is not linear, we will assume <math>\frac{P(x^{(i)}, z^{(i)}=j; \theta)
}{Q^{(i)}_{(j)}}</math> is a constant.<br>
This implies <math>Q^(i)(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br>
Since Q is a pmf, we have <math>Q^(i)(j) = \frac{1}{P(x^({i})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math>