Deep Learning: Difference between revisions

1,524 bytes added ,  13 October 2020
Line 1,062: Line 1,062:
The dual of <math>\min_{G} W_1(P_Y, P_{\hat{Y}})</math> is <math>\min_{G} \max_{D} \left[ E[D(Y)] - E[D(\hat{Y})] \right]</math>.   
The dual of <math>\min_{G} W_1(P_Y, P_{\hat{Y}})</math> is <math>\min_{G} \max_{D} \left[ E[D(Y)] - E[D(\hat{Y})] \right]</math>.   
The lipschitz of the discriminator can be enforced by weight clipping.
The lipschitz of the discriminator can be enforced by weight clipping.
===How to evaluate GANs?===
;Inception Score
Use a pre-trained network (Inception-v3) to map a generated image to its probabilities. 
<math>IS(G) = \exp \left( E_{x \sim P_{\hat{X}}} KL( p(y|x) \Vert p(y) ) \right)</math>
Mutual Information interpretation: 
<math>\log(IS(G)) = I(G(Z);y) = H(y) - H(y|G(z))</math>
* The first term <math>H(y)</math> represents diverse labels.
* The second score represents high confidence.
IS is misleading if it only generates one image per class.
;FID Score
Use a pre-trained network (Inception) to extract features from an intermediate layer.
Then model the data distribution using multivariate Gaussian with mean <math>\mu</math> and covariance <math>\Sigma</math>. 
FID is Frechet Inception Distance. 
<math>FID(x, y) = \Vert \mu_{x} - \mu_{g} \Vert_2^2 + Tr(\Sigma_{x} + \Sigma_g - 2(\Sigma_x \Sigma_g)^{1/2})</math>
===A Statistical Approach to GANs===
GANs do not have explicit probability models. 
This is in contrast to maximum-likelihood models like VAEs. 
GANs focus on minimizing distance between distributions. 
This yields high-quality samples but inability to sample likelihoods.
VAEs maximize lower bound on likelihood. However, you get blurry samples.
The key idea is to have an explicit model for the data:
<math>f_{Y}(y|X=x) ~ exp(-l(y, G(x))/\lambda)</math>
;Theorem (BHCF 2019)
...
Entropic GANs meat VAEs.
===Distributionally Robust Wasserstein===
Robust Wasserstein:
<math>
\begin{aligned}
\min_{P_{\tilde{X}}, P_{\tilde{Y}}}
\end{aligned}
</math>


==Misc==
==Misc==