Visual Learning and Recognition: Difference between revisions

Line 1,151: Line 1,151:
\begin{aligned}
\begin{aligned}
&\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\
&\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\
\implies &\log P(x) \geq E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]
\implies &\log P(x) \geq E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]
\end{aligned}
\end{aligned}
</math>
</math>
This is known as variational lower bound or ''ELBO''.
* We first have the encoder output a mean <math>\mu_{z|x}</math> and covariance matrix diagonal <math>\Sigma_{z|x}</math>. 
* For ELBO we want to optimize <math>E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]</math>. 
* Our first loss is <math>D_{KL}(N(\mu_{z|x}, \Sigma_{z|x}) \Vert N(0, I))</math>. 
* We sample z from <math>N(\mu_{z|x}, \Sigma_{z|x})</math> and pass it to the decoder which outputs <math>\mu_{x|z}, \Sigma_{x|z}</math>. 
* Sample <math>\hat{x}</math> from the distribution and have reconstruction loss <math>\Vert x - \hat{x} \Vert^2</math>. 
* Most blog posts will forget to sample from <math>P(x|z)</math>.
;Modeling P(x|z)
Let f(z) be the network output.
* Assume <math>P(x|z)</math> is iid Gaussian.
* <math>\hat{x} = f(z) + \eta</math> where <math>\eta \sim N(0,1)</math>
* Simplifies to an L2 loss <math>\Vert x - f(z) \Vert^2</math>
Importance weighted VAE uses N samples for the loss.
;Reparameterization trick
To sample from the latent space, you do <math>z = \mu + \sigma \varepsilon</math> where <math>\varepsilon \sim N(0,1)</math>. 
This way, you can backprop through through the sampling step.
;Conditional VAE
Just input the condition into the encoder and decoder.
;Pros
* Principled approach to generative models
* Allows inference of <math>q(z|x)</math> which can be used as a feature representation
;Cons
* Maximizes ELBO
* Samples are blurrier than GANs
Why are samples blurry?
* Samples are not blurry but noisy
** Sample vs Mean/Expected Value
* L2 loss


==Will be on the exam==
==Will be on the exam==