Deep Learning: Difference between revisions

1,862 bytes added ,  22 October 2020
Line 1,382: Line 1,382:


==Flow-based Generative Models==
==Flow-based Generative Models==
Lecture 16 (October 22, 2020)
Suppose we have a dataset <math>\{x_1,..., x_n\} \subset \mathbb{R}^d</math>. 
Our probabilistic model is: 
<math>z \sim N(0,I)</math> and <math>x=g_{\theta}(z)</math> where g is bijective and invertible. 
We assume <math>f</math> is a differentiable function. 
Generation or sampling goes from <math>z</math> to <math>x</math>. 
Inference goes from <math>x</math> to <math>z</math>.
Change of variables in 1d: <math>P(z)dz = P(x)dx \implies P(x) = P(z) \frac{dz}{dx}</math>. 
In high-dim: <math>P(x) = P(z) | det(\frac{dz}{dx}) |</math>.
;Maximum Likelihood
<math>P_{\theta}(x) = P(z) | det(\frac{dz}{dx})|</math>. 
<math>
\begin{aligned}
\max_{\theta} \frac{1}{n} \sum_{i=1}^{n} \log P_{\theta}(x_i) \\
= \min_{\theta} \frac{1}{n} \sum_{i=1}^{n} \left[ -\log P_{\theta}(x_i) \right]\\
= \min_{\theta} \frac{-1}{n} \sum_{i=1}^{n} \left[ \log P(z_i) + \log | det(\frac{dz}{dx})| \right]
\end{aligned}
</math>
;Issues
* How to design a bijective function?
* <math>det(J)</math> computation can be very expensive.
;Idea
* Come up with J (i.e. f/g mappings) such that det(J) is easy to compute.
{{hidden | warm-up |
If <math>x=Az+b</math> then <math>z=A^{-1}(x-b)</math>. 
<math>J = A^{-1}</math> is expensive to compute.
}}
If <math>J</math> is a diagonal matrix then <math>det(J) = \prod_i J_{ii}</math>. 
An <math>f</math> function which is element-wise would have a diagonal jacobian. This is not very expressive. 
RealNVP by [Dint et al] considers an upper-triangular matrix. 
In this case, the determinant is still the diagonal. 
What does f/g look like? 
<math>
\begin{cases}
z_1 = x_1\\
z_2 = s_\theta \cdot x_2 + t_\theta
\end{cases}
</math> 
Now our jacobian is: 
<math>
J = \begin{pmatrix}
I & 0\\
\frac{dz_2}{dx_1} & diag(S_\theta)
\end{pmatrix}
</math>
and <math>det(J) = \prod (S_\theta)_i</math>.


==Misc==
==Misc==