5,337
edits
Line 1,382: | Line 1,382: | ||
==Flow-based Generative Models== | ==Flow-based Generative Models== | ||
Lecture 16 (October 22, 2020) | |||
Suppose we have a dataset <math>\{x_1,..., x_n\} \subset \mathbb{R}^d</math>. | |||
Our probabilistic model is: | |||
<math>z \sim N(0,I)</math> and <math>x=g_{\theta}(z)</math> where g is bijective and invertible. | |||
We assume <math>f</math> is a differentiable function. | |||
Generation or sampling goes from <math>z</math> to <math>x</math>. | |||
Inference goes from <math>x</math> to <math>z</math>. | |||
Change of variables in 1d: <math>P(z)dz = P(x)dx \implies P(x) = P(z) \frac{dz}{dx}</math>. | |||
In high-dim: <math>P(x) = P(z) | det(\frac{dz}{dx}) |</math>. | |||
;Maximum Likelihood | |||
<math>P_{\theta}(x) = P(z) | det(\frac{dz}{dx})|</math>. | |||
<math> | |||
\begin{aligned} | |||
\max_{\theta} \frac{1}{n} \sum_{i=1}^{n} \log P_{\theta}(x_i) \\ | |||
= \min_{\theta} \frac{1}{n} \sum_{i=1}^{n} \left[ -\log P_{\theta}(x_i) \right]\\ | |||
= \min_{\theta} \frac{-1}{n} \sum_{i=1}^{n} \left[ \log P(z_i) + \log | det(\frac{dz}{dx})| \right] | |||
\end{aligned} | |||
</math> | |||
;Issues | |||
* How to design a bijective function? | |||
* <math>det(J)</math> computation can be very expensive. | |||
;Idea | |||
* Come up with J (i.e. f/g mappings) such that det(J) is easy to compute. | |||
{{hidden | warm-up | | |||
If <math>x=Az+b</math> then <math>z=A^{-1}(x-b)</math>. | |||
<math>J = A^{-1}</math> is expensive to compute. | |||
}} | |||
If <math>J</math> is a diagonal matrix then <math>det(J) = \prod_i J_{ii}</math>. | |||
An <math>f</math> function which is element-wise would have a diagonal jacobian. This is not very expressive. | |||
RealNVP by [Dint et al] considers an upper-triangular matrix. | |||
In this case, the determinant is still the diagonal. | |||
What does f/g look like? | |||
<math> | |||
\begin{cases} | |||
z_1 = x_1\\ | |||
z_2 = s_\theta \cdot x_2 + t_\theta | |||
\end{cases} | |||
</math> | |||
Now our jacobian is: | |||
<math> | |||
J = \begin{pmatrix} | |||
I & 0\\ | |||
\frac{dz_2}{dx_1} & diag(S_\theta) | |||
\end{pmatrix} | |||
</math> | |||
and <math>det(J) = \prod (S_\theta)_i</math>. | |||
==Misc== | ==Misc== |