Unsupervised Learning: Difference between revisions

Line 19: Line 19:
Taking the gradient and setting it to 0 we get:<br>
Taking the gradient and setting it to 0 we get:<br>
<math>
<math>
\nabla L(\mu, \mathbf{z}) &= \nabla \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
\nabla L(\mu, \mathbf{z}) = \nabla \sum_{i} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
</math><br>
</math><br>
<math>
<math>
&= \nabla \sum_{j=1}{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
= \nabla \sum_{j=1}{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{z^{(i)}} \Vert ^2
</math><br>
</math><br>
<math>
<math>
&= \nabla \sum_{j=1}{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2
= \nabla \sum_{j=1}{k} \sum_{i\mid z(i)=j} \Vert x^{(i)} - \mu_{j} \Vert ^2
</math><br>
</math><br>
<math>
<math>
&= \sum_{j=1}{k} \sum_{i\mid z(i)=j}  \nabla  \Vert x^{(i)} - \mu_{j} \Vert ^2
= \sum_{j=1}{k} \sum_{i\mid z(i)=j}  \nabla  \Vert x^{(i)} - \mu_{j} \Vert ^2
</math><br>
</math><br>
<math>
<math>
&=  \sum_{j=1}{k} \sum_{i\mid z(i)=j} \nabla \Vert x^{(i)} - \mu_{j} \Vert ^2
=  \sum_{j=1}{k} \sum_{i\mid z(i)=j} \nabla \Vert x^{(i)} - \mu_{j} \Vert ^2
</math><br>
</math><br>
<math>
<math>
&=  \sum_{j=1}{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j})
=  \sum_{j=1}{k} \sum_{i\mid z(i)=j} 2(x^{(i)} - \mu_{j})
</math><br>
</math><br>
<math>
<math>
\implies \mu_{j} &= (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j
\implies \mu_{j} = (\sum_{i\mid z(i)=j} x^{(i)})/(\sum_{i\mid z(i)=j} 1) \quad \forall j
</math>
</math>