Jump to content

Unsupervised Learning: Difference between revisions

(3 intermediate revisions by the same user not shown)
Line 189: Line 189:
====Model====
====Model====
Our model for how the data is generated is as follows:
Our model for how the data is generated is as follows:
* Generate latent variables <math>z^{(1)},...,z^{(m)} \in \mathbb{R}^r<math> iid where dimension r is less than n.
* Generate latent variables <math>z^{(1)},...,z^{(m)} \in \mathbb{R}^r</math> iid where dimension r is less than n.
** We assume <math>Z^{(i)} \sim N(\mathbf{0},\mathbf{I})</math>
** We assume <math>Z^{(i)} \sim N(\mathbf{0},\mathbf{I})</math>
* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \sin N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>
* Generate <math>x^{(i)}</math> where <math>X^{(i)} \vert Z^{(i)} \sin N(g_{\theta}(z), \sigma^2 \mathbf{I})</math>
Line 196: Line 196:
====Variational Bound====
====Variational Bound====
The variational bound is:
The variational bound is:
* <math>\log P(x^{(i)}) \geq E_{Z}[\log P(X^{(i)} \vert Z)] - KL(Q_i(z) \Vert P(z))</math>
* <math>\log P(x^{(i)}) \geq E_{Z \sim Q_i}[\log P(X^{(i)} \vert Z)] - KL(Q_i(z) \Vert P(z))</math>
{{hidden | Derivation |
{{hidden | Derivation |
We know from Baye's rule
We know from Baye's rule that <math>P(z|X) = \frac{P(X|z)P(z)}{P(X)}</math>.<br>
Plugging this into the equation for <math>KL(Q_i(z) \Vert P(z|X))</math> yields our inequality.<br>
<math>KL(Q_i(z) \Vert P(z|X)) = E_{Q} \left[ \log(\frac{Q_i(z)}{P(z|X)}) \right]</math><br>
<math>=E_Q(\log(\frac{Q_i(z) P(X^{(i)})}{P(X_z)P(z)})</math><br>
<math>=E_Q(\log(\frac{Q_i(z)}{P(z)})) + \log(P(x^{(i)})) - E_Q(\log(P(X|z))</math><br>
<math>=KL(Q_i(z) \Vert P(z)) + \log(P(x^{(i)}) - E_Q(\log(P(X|z))</math><br>
Rearranging terms we get:<br>
<math>\log P(x^{(i)}) - KL(Q_i(z) \Vert P(z|X)) = E_Q(\log(P(X|z)) - KL(Q_i(z) \Vert P(z))</math><br>
Since the KL divergence is greater than or equal to 0, our variational bound follows.
}}
}}
==GANs==
{{main | Generative adversarial network}}
===Wasserstein GAN===
[https://arxiv.org/abs/1701.07875 Paper]<br>
The main idea is to ensure the that discriminator is lipschitz continuous and to limit the lipschitz constant (i.e. the derivative) of the discriminator.<br>
If the correct answer is 1.0 and the generator produces 1.0001, we don't want the discriminator to give us a very high loss.<br>