5,337
edits
Line 1,151: | Line 1,151: | ||
\begin{aligned} | \begin{aligned} | ||
&\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\ | &\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\ | ||
\implies &\log P(x) \geq E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)] | \implies &\log P(x) \geq E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)] | ||
\end{aligned} | \end{aligned} | ||
</math> | </math> | ||
This is known as variational lower bound or ''ELBO''. | |||
* We first have the encoder output a mean <math>\mu_{z|x}</math> and covariance matrix diagonal <math>\Sigma_{z|x}</math>. | |||
* For ELBO we want to optimize <math>E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]</math>. | |||
* Our first loss is <math>D_{KL}(N(\mu_{z|x}, \Sigma_{z|x}) \Vert N(0, I))</math>. | |||
* We sample z from <math>N(\mu_{z|x}, \Sigma_{z|x})</math> and pass it to the decoder which outputs <math>\mu_{x|z}, \Sigma_{x|z}</math>. | |||
* Sample <math>\hat{x}</math> from the distribution and have reconstruction loss <math>\Vert x - \hat{x} \Vert^2</math>. | |||
* Most blog posts will forget to sample from <math>P(x|z)</math>. | |||
;Modeling P(x|z) | |||
Let f(z) be the network output. | |||
* Assume <math>P(x|z)</math> is iid Gaussian. | |||
* <math>\hat{x} = f(z) + \eta</math> where <math>\eta \sim N(0,1)</math> | |||
* Simplifies to an L2 loss <math>\Vert x - f(z) \Vert^2</math> | |||
Importance weighted VAE uses N samples for the loss. | |||
;Reparameterization trick | |||
To sample from the latent space, you do <math>z = \mu + \sigma \varepsilon</math> where <math>\varepsilon \sim N(0,1)</math>. | |||
This way, you can backprop through through the sampling step. | |||
;Conditional VAE | |||
Just input the condition into the encoder and decoder. | |||
;Pros | |||
* Principled approach to generative models | |||
* Allows inference of <math>q(z|x)</math> which can be used as a feature representation | |||
;Cons | |||
* Maximizes ELBO | |||
* Samples are blurrier than GANs | |||
Why are samples blurry? | |||
* Samples are not blurry but noisy | |||
** Sample vs Mean/Expected Value | |||
* L2 loss | |||
==Will be on the exam== | ==Will be on the exam== |