Unsupervised Learning: Difference between revisions

(6 intermediate revisions by the same user not shown)

Line 194:

* Generate latent variables <math>z^{(1)},...,z^{(m)} \in \mathbb{R}^r</math> iid where dimension r is less than n.

** We assume <math>Z^{(i)} \sim N(\mathbf{0},\mathbf{I})</math>

* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \~~sin~~ N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>

* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \sim N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>

** For some function <math>g_{\theta_1}</math> parameterized by <math>\theta_1</math>

Line 228:

;Notes

* Also known as Wasserstein metric

==Dimension Reduction==

Goal: Reduce the dimension of a dataset.

If each example <math>x \in \mathbb{R}^n</math>, we want to reduce each example to be in <math>\mathbb{R}^k</math> where <math>k < n</math>

===PCA===

Principal Component Analysis

Preprocessing: Subtract the sample mean from each example so that the new sample mean is 0.

Goal: Find a vector <math>v_1</math> such that the projection <math>v_1 \cdot x</math> has maximum variance.

Result: These principal components are the eigenvectors of <math>X^TX</math>.

Idea: Maximize the variance of the projections.

Note that <math>\sum (v_1 \cdot x^{(i)})^2 = \sum v_1^T x^{(i)} (x^{(i)})^T v_1 = v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1</math>.

<!--

Thus our lagrangian is <math>\max_{\alpha} v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1 + \alpha(\Vert v_1 \Vert^2 - 1)</math>

Taking the gradient of this we get:

<math>\nabla_{v_1} (v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1) + \alpha(\Vert v_1 \Vert^2 - 1)</math>

<math>= 2(\sum x^{(i)} (x^{(i)})^T) v_1 + 2\alpha v_1</math>

-->

===Kernel PCA===

===Autoencoder===

You have a encoder and a decoder which are both neural networks.

@@ Line 194: / Line 194: @@
 * Generate latent variables <math>z^{(1)},...,z^{(m)} \in \mathbb{R}^r</math> iid where dimension r is less than n.
 ** We assume <math>Z^{(i)} \sim N(\mathbf{0},\mathbf{I})</math>
-* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \sin N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>
+* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \sim N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>
 ** For some function <math>g_{\theta_1}</math> parameterized by <math>\theta_1</math>
@@ Line 228: / Line 228: @@
 ;Notes
 * Also known as Wasserstein metric
+==Dimension Reduction==
+Goal: Reduce the dimension of a dataset.<br>
+If each example <math>x \in \mathbb{R}^n</math>, we want to reduce each example to be in <math>\mathbb{R}^k</math> where <math>k < n</math>
+===PCA===
+Principal Component Analysis<br>
+Preprocessing: Subtract the sample mean from each example so that the new sample mean is 0.<br>
+Goal: Find a vector <math>v_1</math> such that the projection <math>v_1 \cdot x</math> has maximum variance.<br>
+Result: These principal components are the eigenvectors of <math>X^TX</math>.<br>
+Idea: Maximize the variance of the projections.<br>
+<math>\max \frac{1}{m}\sum (v_1 \cdot x^{(i)})^2</math><br>
+Note that <math>\sum (v_1 \cdot x^{(i)})^2 = \sum v_1^T x^{(i)} (x^{(i)})^T v_1 = v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1</math>.<br>
+<!--
+Thus our lagrangian is <math>\max_{\alpha} v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1 + \alpha(\Vert v_1 \Vert^2 - 1)</math><br>
+Taking the gradient of this we get:<br>
+<math>\nabla_{v_1} (v_1^T (\sum x^{(i)} (x^{(i)})^T) v_1) + \alpha(\Vert v_1 \Vert^2 - 1)</math><br>
+<math>= 2(\sum x^{(i)} (x^{(i)})^T) v_1 + 2\alpha v_1</math>
+-->
+===Kernel PCA===
+{{main | Wikipedia: Kernel principal component analysis}}
+===Autoencoder===
+You have a encoder and a decoder which are both neural networks.