Unsupervised Learning: Difference between revisions

Unsupervised Learning (view source)

Revision as of 18:40, 14 November 2019

1,736 bytes added , 14 November 2019

→‎Soft K-means

David

Bureaucrats, Interface administrators, Administrators

5,321

edits

@@ Line 52: / Line 52: @@
 ===Soft K-means===
-We will develop a model for how our data is generated:<br>
+We will develop a model for how our data is generated.<br>
+We will then find probabilities for each element being from a cluster (ala Bayesian paradigm).<br>
 Given <math>k</math> clusters, the probability of a point being from cluster k is <math>\phi_k = P(z^{(i)} = k)</math><br>
+We will assume each cluster is from a normal distribution <math>N(\mu_j, \sigma_j)</math>
+====EM Algorithm====
+Expectation Maximization<br>
+The key idea is to introduce intermediate coefficients to apply Jensen's inequality.<br>
+;Maximum Likelihood
+Let <math>\theta_j = [\phi_j, \mu_j, \sigma_j]</math> denote our parameters.<br>
+The likelihood is <math>L(\theta) = \Pi_{i=1}^{m} P(x^{(i)}; \theta)</math> where <math>P</math> is our pdf/pmf.<br>
+Then the log-likelihood is <math>l(\theta) = \sum_{i=1}^{m} \log P(x^{(i)}; \theta) = \sum_{i=1}^{m} \log \sum_{j=1}^{k}P(x^{(i)}, z^{(i)}=j; \theta)</math><br>
+By introducing an extra variable <math>Q^{(i)}_{(j)}</math> we'll be able to apply Jensen's inequality to the concave log function.<br>
+Assume <math>Q^{(i)}{(j)}</math> is a probability mass function.<br>
+<math>
+l(\theta) = \sum_{i=1}^{m} \log \sum_{j=1}^{k}P(x^{(i)}, z^{(i)}=j; \theta)
+</math><br>
+<math>
+=\sum_{i=1}^{m} \log \sum_{j=1}^{k} \frac{Q^{(i)}_{(j)}}{Q^{(i)}_{(j)}} P(x^{(i)}, z^{(i)}=j; \theta)
+</math><br>
+<math>
+\geq \sum_{i=1}^{m} \sum_{j=1}^{k} Q^{(i)}_{(j)} \log(\frac{P(x^{(i)}, z^{(i)}=j; \theta)
+}{Q^{(i)}_{(j)}})
+</math>
+<math>
+\implies
+</math>
+Jensen's inequality holds with equality iff either the function is linear or if the random variable is degenerate.<br>
+Since log is not linear, we will assume <math>\frac{P(x^{(i)}, z^{(i)}=j; \theta)
+}{Q^{(i)}_{(j)}}</math> is a constant.<br>
+This implies <math>Q^(i)(j) = c * P(x^{(i)}, z^{(i)} = j ; \theta)</math>.<br>
+Since Q is a pmf, we have <math>Q^(i)(j) = \frac{1}{P(x^({i})} * P(x^{(i)}, z^{(i)} = j ; \theta) = P(z^{(i)} ; x^{(i)}, \theta)</math>