Unsupervised Learning: Difference between revisions

Line 18: Line 18:
If we fix <math>\mathbf{z}</math> then we need to minimize <math>L(\mu, \mathbf{z})</math> wrt <math>\mu</math>.<br>
If we fix <math>\mathbf{z}</math> then we need to minimize <math>L(\mu, \mathbf{z})</math> wrt <math>\mu</math>.<br>
Taking the gradient and setting it to 0 we get:<br>
Taking the gradient and setting it to 0 we get:<br>
<math>
<math display="block">
\nabla_{\mu} L(\mu, \mathbf{z}) = \nabla_{\mu} \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
\begin{aligned}
</math><br>
\nabla_{\mu} L(\mu, \mathbf{z}) &= \nabla_{\mu} \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2\\
<math>
&= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2\\
= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
&= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2\\
</math><br>
&=  \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \nabla_{\mu} \Vert x^{(i)} - \mu_{j} \Vert ^2\\
<math>
&=  -\sum_{j=1}^{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j}) = 0\\
= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2
\implies \mu_{j} &= (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j
</math><br>
\end{aligned}
<math>
=  \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \nabla_{\mu} \Vert x^{(i)} - \mu_{j} \Vert ^2
</math><br>
<math>
=  -\sum_{j=1}^{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j}) = 0
</math><br>
<math>
\implies \mu_{j} = (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j
</math>
</math>