Unsupervised Learning: Difference between revisions

 
(4 intermediate revisions by the same user not shown)
Line 8: Line 8:
Here we wish to cluster them to minimize the distance to the mean of their cluster.<br>
Here we wish to cluster them to minimize the distance to the mean of their cluster.<br>
In our formulation we have k clusters.<br>
In our formulation we have k clusters.<br>
The mean of each cluster <math>\mu_i</math> is called the centroid.</math><br>
The mean of each cluster <math>\mu_i</math> is called the centroid.<br>
====Optimization====
====Optimization====
Let <math>\mathbf{\mu}</math> denote the centroids and let <math>\mathbf{z}</math> denote the cluster labels for our data.<br>
Let <math>\mathbf{\mu}</math> denote the centroids and let <math>\mathbf{z}</math> denote the cluster labels for our data.<br>
Line 103: Line 103:
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br>
<math>J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)</math><br>
Assume <math>\Sigma_j = I</math> for simplicity.<br>
Assume <math>\Sigma_j = I</math> for simplicity.<br>
Then<br>
Then:
<math>
<math display="block">
J(\theta, Q) = \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)
\begin{aligned}
</math><br>
J(\theta, Q) &= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log \left( \frac{P(x^{(i)}, z^{(i)}=j;\theta)}{Q^{(i)}(j)} \right)\\
<math>
&= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)}, z^{(i)}=j;\theta)) + C_1 \\
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)}, z^{(i)}=j;\theta)) + C_1  
&= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j) P(z^{(i)}=j)) + C_1\\
</math><br>
&= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j)) -  Q^{(i)}_{(j)} \log( P(z^{(i)}=j)) + C_1\\
<math>
&= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( (2\pi)^{-n/2}exp(-\Vert x^{(i)} + \mu_j \Vert^2 / 2)) -  Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_1 \\
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j) P(z^{(i)}=j)) + C_1
&= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} -\Vert x^{(i)} - \mu_j \Vert^2 / 2) +  Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_2  
</math><br>
\end{aligned}
<math>
= \sum_{i=1}^{m} \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( P(x^{(i)} \mid z^{(i)}=j)) -  Q^{(i)}_{(j)} \log( P(z^{(i)}=j)) + C_1
</math><br>
<math>
= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} \log ( (2\pi)^{-n/2}exp(-\Vert x^{(i)} + \mu_j \Vert^2 / 2)) -  Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_1  
</math><br>
<math>
= \sum_{i=1}^{m}\left[ \sum_{j=1}^{m} Q^{(i)}_{(j)} -\Vert x^{(i)} - \mu_j \Vert^2 / 2) +  Q^{(i)}_{(j)} \log( \phi_j) \right]+ C_2  
</math><br>
</math><br>
Maximizing wrt <math>\mu</math>, we get <math>\mu_j^* = (\sum_{i} Q^{(i)}_{(i)}x^{(i)}) / (\sum_{i}Q^{(i)}_{(j)})</math>.<br>
Maximizing wrt <math>\mu</math>, we get <math>\mu_j^* = (\sum_{i} Q^{(i)}_{(i)}x^{(i)}) / (\sum_{i}Q^{(i)}_{(j)})</math>.<br>
Line 134: Line 126:
<math>= \min_{\beta} \sum_{j=1}^{k} \left[\log(\frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)})) \sum_{i=1}^{m} Q^{(i)}_{(j)}  -(\sum_{i}Q^{(i)}_{(j)}) - (\beta/k) \right]</math><br>
<math>= \min_{\beta} \sum_{j=1}^{k} \left[\log(\frac{-1}{\beta}(\sum_{i}Q^{(i)}_{(j)})) \sum_{i=1}^{m} Q^{(i)}_{(j)}  -(\sum_{i}Q^{(i)}_{(j)}) - (\beta/k) \right]</math><br>
Taking the derivative with respect to <math>\beta</math>, we get:<br>
Taking the derivative with respect to <math>\beta</math>, we get:<br>
<math>
<math display="block">
\sum_{j=1}^{k} [(\frac{1}{(-1/\beta)(\sum Q)})(-\sum Q)(-\beta^{-2})(\sum Q) - \frac{1}{k}]  
\begin{aligned}
</math><br>
\sum_{j=1}^{k} [(\frac{1}{(-1/\beta)(\sum Q)})(-\sum Q)(-\beta^{-2})(\sum Q) - \frac{1}{k}]
<math>
&= \sum_{j=1}^{k} [ (\beta)(-\beta^{-2})(\sum Q) - \frac{1}{k}] \\
=\sum_{j=1}^{k} [ (\beta)(-\beta^{-2})(\sum Q) - \frac{1}{k}]  
&= \sum_{j=1}^{k} [\frac{-1}{\beta}(\sum_{i=1}^{m} Q) - \frac{1}{k}]\\
</math><br>
&= [\sum_{i=1}^{m} \frac{-1}{\beta} \sum_{j=1}^{k}P(z^{(i)} = j | x^{(i)}) - \sum_{j=1}^{k}\frac{1}{k}]\\
<math>
&= [\frac{-1}{\beta}\sum_{i=1}^{m}1 - 1]\\
= \sum_{j=1}^{k} [\frac{-1}{\beta}(\sum_{i=1}^{m} Q) - \frac{1}{k}]
&= \frac{-m}{\beta} - 1 = 0\\
</math><br>
\implies \beta &= -m
<math>
\end{aligned}
=[\sum_{i=1}^{m} \frac{-1}{\beta} \sum_{j=1}^{k}P(z^{(i)} = j | x^{(i)}) - \sum_{j=1}^{k}\frac{1}{k}]
</math><br>
<math>
= [\frac{-1}{\beta}\sum_{i=1}^{m}1 - 1]
</math><br>
<math>
= \frac{-m}{\beta} - 1 = 0
</math><br>
<math>
\implies \beta = -m
</math><br>
</math><br>
Plugging in <math>\beta = -m</math> into our equation for <math>\phi_j</math> we get <math>\phi_j = \frac{1}{m}\sum_{i=1}^{m}Q^{(i)}_{(j)}</math>
Plugging in <math>\beta = -m</math> into our equation for <math>\phi_j</math> we get <math>\phi_j = \frac{1}{m}\sum_{i=1}^{m}Q^{(i)}_{(j)}</math>
Line 191: Line 173:
We know from Baye's rule that <math>P(z|X) = \frac{P(X|z)P(z)}{P(X)}</math>.<br>
We know from Baye's rule that <math>P(z|X) = \frac{P(X|z)P(z)}{P(X)}</math>.<br>
Plugging this into the equation for <math>KL(Q_i(z) \Vert P(z|X))</math> yields our inequality.<br>
Plugging this into the equation for <math>KL(Q_i(z) \Vert P(z|X))</math> yields our inequality.<br>
<math>KL(Q_i(z) \Vert P(z|X)) = E_{Q} \left[ \log(\frac{Q_i(z)}{P(z|X)}) \right]</math><br>
<math display="block">
<math>=E_Q(\log(\frac{Q_i(z) P(X^{(i)})}{P(X_z)P(z)})</math><br>
\begin{aligned}
<math>=E_Q(\log(\frac{Q_i(z)}{P(z)})) + \log(P(x^{(i)})) - E_Q(\log(P(X|z))</math><br>
KL(Q_i(z) \Vert P(z|X)) &= E_{Q} \left[ \log(\frac{Q_i(z)}{P(z|X)}) \right]\\
<math>=KL(Q_i(z) \Vert P(z)) + \log(P(x^{(i)}) - E_Q(\log(P(X|z))</math><br>
&=E_Q(\log(\frac{Q_i(z) P(X^{(i)})}{P(X|z)P(z)})\\
&=E_Q(\log(\frac{Q_i(z)}{P(z)})) + \log(P(x^{(i)})) - E_Q(\log(P(X|z))\\
&=KL(Q_i(z) \Vert P(z)) + \log(P(x^{(i)}) - E_Q(\log(P(X|z))
\end{aligned}
</math>
Rearranging terms we get:<br>
Rearranging terms we get:<br>
<math>\log P(x^{(i)}) - KL(Q_i(z) \Vert P(z|X)) = E_Q(\log(P(X|z)) - KL(Q_i(z) \Vert P(z))</math><br>
<math>\log P(x^{(i)}) - KL(Q_i(z) \Vert P(z|X)) = E_Q(\log(P(X|z)) - KL(Q_i(z) \Vert P(z))</math><br>