Deep Learning: Difference between revisions

Line 994: Line 994:
We use SGD to optimize <math>\theta, \phi</math>.   
We use SGD to optimize <math>\theta, \phi</math>.   
Using the reparameterization trick, <math>z = \mu + \Sigma^{1/2}\epsilon</math> for <math>\epsilon \sim N(0, I)</math>.
Using the reparameterization trick, <math>z = \mu + \Sigma^{1/2}\epsilon</math> for <math>\epsilon \sim N(0, I)</math>.
;ELBO
<math>\max_{\theta, \phi} E_{z \sim q}[\log P(x|z)] - KL(q(z|x) \Vert P(z))</math>
Issue: Posterior collapse. 
In practice, sometimes the posterior <math>q</math> does not depend on x: <math>q(z|x) \approx q(z)</math>.
===β-VAE===


==Misc==
==Misc==