Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 15:48, 13 October 2020

2,345 bytes added , 13 October 2020

→‎Variational Autoencoders (VAE)

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 1,018: / Line 1,018: @@
 VQ-VAE (Vector quantized VAE) perform quantization of the latent space.
 The quantization is non differentiable but they can copy the gradients.
+==Generative Adversarial Networks (GANs)==
+Given data <math>\{y_1,...,y_n\}</math>.
+The goal of the generator is to take random noise <math>\{x_i,...,x_n\}</math> and generate fake data <math>\{\hat{y}_1,...,\hat{y}_n\}</math>.
+Then there is a discriminator which takes in <math>\{y_i\}</math> and <math>\{\hat{y}_i\}</math> and guide the generator.
+In practice, both use deep neural networks.
+The optimization is <math>\min_{G} \max_{D} f(G, D)</math>.
+GAN training is challenging.
+Oftentimes, there are convergence issues.
+There can also be mode collapsing issues.
+Generalization can be poor and performance evaluation is subjective.
+A common approach for training GANs is using alternating gradient descent.
+However, this usually does not converge to <math>G^*</math>.
+===Reducing unsupervised to supervised===
+;Formulating GANs
+Given <math>\{y_i\}</math> and <math>\{x_i\}</math>.
+We need to find a generator <math>G</math> s.t. <math>G(X) \stackrel{\text{dist}}{\approx} Y</math>.
+Given some data <math>\{y_i\}</math>, generate some randomness <math>\{x_i\}</math>.
+Create a ''coupling'' <math>\pi()</math> to create paired examples <math>\{(x_{\pi(i)}, y_i)\}</math>.
+Then we have:
+<math>\min_{\pi} \min_{G} \frac{1}{n} \sum_{i=1}^{n} l(\mathbf{y}_i, G(\mathbf{x}_{\pi(i)}))</math>
+We can replace the coupling with a joint distribution:
+<math>\min_{\mathbb{P}_{X, Y}} \min_{G} \frac{1}{n} E_{\mathbb{P}_{X,Y}}[ l(\mathbf{y}_i, G(\mathbf{x}_{\pi(i)}))]</math>.
+By switching the min and substituting <math>\hat{Y} = G(X)</math>:
+<math>\min_{G} \min_{\mathbb{P}} E_{\mathbb{P}_{X,Y}}[l(Y, \hat{Y})]</math>.
+The inner minimization is the optimal transport distance.
+====Optimal Transport (Earth-Mover)====
+Non-parametric distances between probability measures.
+This is well-defined.
+Cost of ''transporting'' yellow to red points:
+<math>\min_{\pi} \frac{1}{n} \sum_{i=1}^{n} l(y_i, \hat{y}_{\pi(i)})</math>.
+If using l2, then <math>dist(P_{Y},P_{\hat{Y}}) = W(P_{Y}, P_{\hat{Y}})</math>.
+;Optimization
+The primal is <math>dist(P_Y, Y_{\hat{Y}}) = \min E[l(Y, \hat{Y})]</math>.
+;WGAN Formulation
+The dual of <math>\min_{G} W_1(P_Y, P_{\hat{Y}})</math> is <math>\min_{G} \max_{D} \left[ E[D(Y)] - E[D(\hat{Y})] \right]</math>.
+The lipschitz of the discriminator can be enforced by weight clipping.
 ==Misc==