Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 15:45, 22 October 2020

1,862 bytes added , 22 October 2020

→‎Flow-based Generative Models

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 1,382: / Line 1,382: @@
 ==Flow-based Generative Models==
+Lecture 16 (October 22, 2020)
+Suppose we have a dataset <math>\{x_1,..., x_n\} \subset \mathbb{R}^d</math>.
+Our probabilistic model is:
+<math>z \sim N(0,I)</math> and <math>x=g_{\theta}(z)</math> where g is bijective and invertible.
+We assume <math>f</math> is a differentiable function.
+Generation or sampling goes from <math>z</math> to <math>x</math>.
+Inference goes from <math>x</math> to <math>z</math>.
+Change of variables in 1d: <math>P(z)dz = P(x)dx \implies P(x) = P(z) \frac{dz}{dx}</math>.
+In high-dim: <math>P(x) = P(z) | det(\frac{dz}{dx}) |</math>.
+;Maximum Likelihood
+<math>P_{\theta}(x) = P(z) | det(\frac{dz}{dx})|</math>.
+<math>
+\begin{aligned}
+\max_{\theta} \frac{1}{n} \sum_{i=1}^{n} \log P_{\theta}(x_i) \\
+= \min_{\theta} \frac{1}{n} \sum_{i=1}^{n} \left[ -\log P_{\theta}(x_i) \right]\\
+= \min_{\theta} \frac{-1}{n} \sum_{i=1}^{n} \left[ \log P(z_i) + \log | det(\frac{dz}{dx})| \right]
+\end{aligned}
+</math>
+;Issues
+* How to design a bijective function?
+* <math>det(J)</math> computation can be very expensive.
+;Idea
+* Come up with J (i.e. f/g mappings) such that det(J) is easy to compute.
+{{hidden | warm-up |
+If <math>x=Az+b</math> then <math>z=A^{-1}(x-b)</math>.
+<math>J = A^{-1}</math> is expensive to compute.
+}}
+If <math>J</math> is a diagonal matrix then <math>det(J) = \prod_i J_{ii}</math>.
+An <math>f</math> function which is element-wise would have a diagonal jacobian. This is not very expressive.
+RealNVP by [Dint et al] considers an upper-triangular matrix.
+In this case, the determinant is still the diagonal.
+What does f/g look like?
+<math>
+\begin{cases}
+z_1 = x_1\\
+z_2 = s_\theta \cdot x_2 + t_\theta
+\end{cases}
+</math>
+Now our jacobian is:
+<math>
+J = \begin{pmatrix}
+I & 0\\
+\frac{dz_2}{dx_1} & diag(S_\theta)
+\end{pmatrix}
+</math>
+and <math>det(J) = \prod (S_\theta)_i</math>.
 ==Misc==