5,337
edits
Line 943: | Line 943: | ||
Probabilistic Model: | Probabilistic Model: | ||
Suppose our dataset is <math>\{x_i\}_{1}^{n}</math> with <math>x_i \in \mathbb{R}^d</math> | |||
# Generate latent variables <math>z_1,...,z_n \in \mathbb{R}^r</math> where <math>r << d</math>. | # Generate latent variables <math>z_1,...,z_n \in \mathbb{R}^r</math> where <math>r << d</math>. | ||
# Assume <math>X=x_i | Z = z_i \sim N \left( g_{theta}(z_i), \sigma^2 I \right)</math> | # Assume <math>X=x_i | Z = z_i \sim N \left( g_{theta}(z_i), \sigma^2 I \right)</math>. | ||
#* Here <math>g_\theta</math> is called the ''generator'' or ''decoder'' function. | |||
Q: How can we pick good model parameters <math>\theta</math>? | |||
Using maximum likelihood: | |||
<math> | |||
\begin{align*} | |||
\max_{\theta} P(\{x_i\}; \theta) &= \prod P(x_i; \theta)\\ | |||
&= \max_{\theta} \sum_{i=1}^{n} \log P_{\theta}(x_i)\\ | |||
&= \max_{\theta} \sum_{i=1}^{n} \log \left( \int_{z} P(z) P(x_i|z) dz \right)\\ | |||
\end{align*} | |||
</math> | |||
This is hard to compute. | |||
Instead we calculate a lower bound and maximize the lower bound: | |||
<math> \max_{\theta} l(\theta) \geq \max_{\theta, \phi} J(\theta, \phi)</math> | |||
;ELBO / Variational lower bound: | |||
<math> | |||
\begin{aligned} | |||
&P(x_i | z) = \frac{P(z | x_i) P(x_i)}{P(z)}\\ | |||
\implies& \log P(x_i | z) + \log P(z) = \log P(z | x_i) + \log P(x_i)\\ | |||
\implies& E_z[\log P(x_i)] = E[ \log P(x_i | z) + \log P(z) - \log P(z | x_i)] \\ | |||
\implies& \log P(x_i) = E_{z \sim q_i}[\log P_{\theta}(x_i | z)] + E[\log P(z)] - E[\log P(z|x_i)] + (E[\log q_i(z)] - E[\log q_i(z)])\\ | |||
\implies& \log P(x_i) = E_{z \sim q_i}[\log P_{\theta}(x_i | z)] + (E_q[\log q_i(z)]- E_q[\log P(z|x_i)]) - (E_q[\log q_i(z) - E_q[\log P(z)]])\\ | |||
\implies& \log P(x_i) = E_{z \sim q_i} \left[\log P_{\theta}(x_i | z) \right] + KL \left(q_i \Vert P(z|x_i) \right) - KL \left(q_i \Vert P(z) \right)\\ | |||
\end{aligned} | |||
</math> | |||
The second term is hard to compute so we ignore it. It is a positive term. | |||
Thus: | |||
<math>\log P(x_i) \geq E_{z \sim q_i} \left[\log P_{\theta}(x_i | z) \right] - KL \left(q_i \Vert P(z) \right)</math> | |||
Optimization: | |||
<math>\max_{\theta, \phi} \sum_{i=1}^{n} E_{z \sim q} \left[\log P_{\theta}(x_i | z) \right] - KL \left(q_i \Vert P(z) \right)</math> | |||
<math>q(z|x) \sim N\left( f_{\phi}(x), \sigma^2 I \right)</math> | |||
Here, <math>f_{\phi}(x)</math> is called the encoder. | |||
The claim is that <math>KL \left(q_i \Vert P(z) \right)</math> is easier to compute: | |||
<math> | |||
\begin{align*} | |||
&\max_{\theta, \phi} \sum_{i=1}^{n} E_{z \sim q} \left[\log P_{\theta}(x_i | z) \right] - KL \left(q_i \Vert P(z) \right)\\ | |||
=&\max_{\theta, \phi} \sum_{i=1}^{n} E_{z \sim q} \left[ \log \exp(-\Vert x_i - g_{\theta}(z) \Vert^2 /(2\sigma^2)) - \log \exp(-\Vert z - f_{\phi}(z) \Vert^2 /(2\sigma^2)) \right]\\ | |||
=&\max_{\theta, \phi} \sum_{i=1}^{n} E_{z \sim q} \left[ -\Vert x_i - g_{\theta}(z) \Vert^2 /(2\sigma^2) + \Vert z - f_{\phi}(z) \Vert^2 /(2\sigma^2) \right]\\ | |||
\end{align*} | |||
</math> | |||
We use SGD to optimize <math>\theta, \phi</math>. | |||
Using the reparameterization trick, <math>z = \mu + \Sigma^{1/2}\epsilon</math> for <math>\epsilon \sim N(0, I)</math>. | |||
==Misc== | ==Misc== |