5,337
edits
Line 18: | Line 18: | ||
If we fix <math>\mathbf{z}</math> then we need to minimize <math>L(\mu, \mathbf{z})</math> wrt <math>\mu</math>.<br> | If we fix <math>\mathbf{z}</math> then we need to minimize <math>L(\mu, \mathbf{z})</math> wrt <math>\mu</math>.<br> | ||
Taking the gradient and setting it to 0 we get:<br> | Taking the gradient and setting it to 0 we get:<br> | ||
<math> | <math display="block"> | ||
\nabla_{\mu} L(\mu, \mathbf{z}) = \nabla_{\mu} \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2 | \begin{aligned} | ||
\nabla_{\mu} L(\mu, \mathbf{z}) &= \nabla_{\mu} \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2\\ | |||
&= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2\\ | |||
= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2 | &= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2\\ | ||
&= \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \nabla_{\mu} \Vert x^{(i)} - \mu_{j} \Vert ^2\\ | |||
&= -\sum_{j=1}^{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j}) = 0\\ | |||
= \nabla_{\mu} \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2 | \implies \mu_{j} &= (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j | ||
\end{aligned} | |||
= \sum_{j=1}^{k} \sum_{i\mid z(i)=j} \nabla_{\mu} \Vert x^{(i)} - \mu_{j} \Vert ^2 | |||
= -\sum_{j=1}^{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j}) = 0 | |||
\implies \mu_{j} = (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j | |||
</math> | </math> | ||