Visual Learning and Recognition: Difference between revisions

Line 1,101: Line 1,101:
* Unpaired image-to-image translation
* Unpaired image-to-image translation
* Cycle-consistency loss
* Cycle-consistency loss
;BicycleGAN
* First learn to reconstruct images with a nice latent code representation in between (cVAE-GAN)
* The main difference is that we have a many-to-many mapping (multi-modal image-to-image) between the two domains.
;MUNIT
* Multimodal UNsupervised Image-to-image Translation
* Maps images from each domain into a shared content space and domain-specific style space.
===Training Problems with GANs===
* Instability
* Difficult to keep generator and discriminator in sync.
** Discriminator cannot be too good or too bad. Same with generator.
** Tricks: LR scheduling, keep discriminator small, update generator more frequently.
* Mode collapse
Mode collapse happens when the generator cannot model different parts of the distribution.
;DCGAN architecture guidelines
* Use strided conv instead of pooling for discriminator.
* Use batchnorm in generator and discriminator.
* Remove FC hidden layers.
* Use Relu for hidden layers, tanh for output layers of generator.
* Use LRelu for discriminator.
LSGAN, WGAN have tricks to mitigate mode collapse.
===Evaluation of GANs===
* Turing test (User study)
* Inception score
===Variational Auto-encoders (VAEs)===
;Training a VAE
* Data likelihood: <math>P(x) = \int P(x|z) P(z) dz</math>
* Approx with samples of z during training: <math>P(x) \approx \frac{1}{n} \sum_{i=0}^{n} P(x | z_i)</math>
* This is impractical.
Assume we can learn a distribution <math>Q(z)</math> such that <math>z \sim Q(z)</math> generates <math>P(x|z) > 0</math>. 
Relating <math>P(x)</math> and <math>E_{z \sim Q(z|x)}</math>? 
<math>
\begin{aligned}
D_{KL}[Q(z|x) \Vert P(z|x)] &= E_{z \sim Q}[\log Q(z|x) - \log P(z)] - E_{z \sim Q}[\log P(x|z)] + \log P(x)\\
&= D_{KL}[Q|P] - E_{z \sim Q}[\log P(x|z)] + \log P(x)
\end{aligned}
</math> 
Rearranging we get: 
<math>
\begin{aligned}
\log P(x) - D_{KL}[Q(z|x) \Vert P(z|x)] = E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]\\
\implies \log P(x) \geq E_{z \sim Q}[\log P(x|z)] - D_{KL}[Q(z|x) \Vert P(z)]
\end{aligned}
</math>


==Will be on the exam==
==Will be on the exam==