Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

Revision as of 18:20, 3 December 2020

1,579 bytes added , 3 December 2020

→‎Variational Auto-encoders (VAEs)

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 1,151: / Line 1,151: @@
 \begin{aligned}
 &\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\
-\implies &\log P(x) \geq E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]
+\implies &\log P(x) \geq E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]
 \end{aligned}
 </math>
+This is known as variational lower bound or ''ELBO''.
+* We first have the encoder output a mean <math>\mu_{z|x}</math> and covariance matrix diagonal <math>\Sigma_{z|x}</math>.
+* For ELBO we want to optimize <math>E_{z \sim Q(z|x)}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]</math>.
+* Our first loss is <math>D_{KL}(N(\mu_{z|x}, \Sigma_{z|x}) \Vert N(0, I))</math>.
+* We sample z from <math>N(\mu_{z|x}, \Sigma_{z|x})</math> and pass it to the decoder which outputs <math>\mu_{x|z}, \Sigma_{x|z}</math>.
+* Sample <math>\hat{x}</math> from the distribution and have reconstruction loss <math>\Vert x - \hat{x} \Vert^2</math>.
+* Most blog posts will forget to sample from <math>P(x|z)</math>.
+;Modeling P(x|z)
+Let f(z) be the network output.
+* Assume <math>P(x|z)</math> is iid Gaussian.
+* <math>\hat{x} = f(z) + \eta</math> where <math>\eta \sim N(0,1)</math>
+* Simplifies to an L2 loss <math>\Vert x - f(z) \Vert^2</math>
+Importance weighted VAE uses N samples for the loss.
+;Reparameterization trick
+To sample from the latent space, you do <math>z = \mu + \sigma \varepsilon</math> where <math>\varepsilon \sim N(0,1)</math>.
+This way, you can backprop through through the sampling step.
+;Conditional VAE
+Just input the condition into the encoder and decoder.
+;Pros
+* Principled approach to generative models
+* Allows inference of <math>q(z|x)</math> which can be used as a feature representation
+;Cons
+* Maximizes ELBO
+* Samples are blurrier than GANs
+Why are samples blurry?
+* Samples are not blurry but noisy
+** Sample vs Mean/Expected Value
+* L2 loss
 ==Will be on the exam==