Jump to content

Unsupervised Learning: Difference between revisions

Line 231: Line 231:
==Dimension Reduction==
==Dimension Reduction==
Goal: Reduce the dimension of a dataset.<br>
Goal: Reduce the dimension of a dataset.<br>
If each example <math>x \in \mathbb{R}^n</math>, we want to reduce each example to be in <math>\mathbb{R}^r</math> where <math>r < n</math>
If each example <math>x \in \mathbb{R}^n</math>, we want to reduce each example to be in <math>\mathbb{R}^k</math> where <math>k < n</math>
===PCA===
===PCA===
Principal Component Analysis<br>
Principal Component Analysis<br>
Preprocessing: Subtract the sample mean from each example so that the new sample mean is 0.<br>
Preprocessing: Subtract the sample mean from each example so that the new sample mean is 0.<br>
Goal: Find a vector <math>v_1</math> such that the projection <math>v_1 \cdot x</math> has maximum variance.<br>
Goal: Find a vector <math>v_1</math> such that the projection <math>v_1 \cdot x</math> has maximum variance.<br>
These principal components are the eigenvectors of <math>X^TX</math>.<br>
Result: These principal components are the eigenvectors of <math>X^TX</math>.<br>
Idea: Maximize the variance of the projections.<br>
<math>\max \frac{1}{m}\sum (v_1 \cdot x^{(i)})^2</math><br>
Note that <math>\sum (v_1 \cdot x^{(i)})^2 = \sum v_1^T x^{(i)} (x^{(i)})^T v_1 = v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1</math>.<br>
<!--
Thus our lagrangian is <math>\max_{\alpha} v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1 + \alpha(\Vert v_1 \Vert^2 - 1)</math><br>
Taking the gradient of this we get:<br>
<math>\nabla_{v_1} (v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1) + \alpha(\Vert v_1 \Vert^2 - 1)</math><br>
<math>= 2(\sum x^{(i)} (x^{(i)})^T) v_1 + 2\alpha v_1</math>
-->


===Kernel PCA===
===Kernel PCA===
===Autoencoder===
===Autoencoder===