Generative adversarial network: Difference between revisions

(21 intermediate revisions by the same user not shown)

Line 2:

Goal: Learn to generate examples from the same distribution as your training set.

==~~Basis~~ Structure==

==Structure==

GANs consist of a generator and a discriminator.

GANs consist of a generator and a discriminator, both of which are usually CNNs.

<pre>

For iteration i

For iteration j

Update ~~Generator~~

Update Discriminator

Update ~~Discriminator~~

Update Generator

</pre>

===Generator===

Two popular types of CNNs used in GANs are Resnets and UNets.

In both cases, we have convolutional blocks which consist of a conv2d layer, a batch norm, and an activation (typically Relu or leakyrelu).

===Discriminator===

A popular discriminator is the PatchGAN discriminator.

These are typically several convolutional blocks stacked together.

Each convolutional layer in the conv block typically has a kernel size of (3x3) or (4x4) and a stride of 1-2.

==Variations==

===Conditional GAN===

[https://arxiv.org/abs/1411.1784 Paper]

Feed data y to both generator and discriminator

===Wasserstein GAN===

[https://arxiv.org/pdf/1704.00028.pdf Paper]

[https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490 Medium post]

This new WGAN-GP loss function improves the stability of training.

Normally, the discriminator is trained with a cross-entropy with sigmoid loss function.

The WGAN proposes using Wasserstein distance which is implemented by removing the cross-entropy+sigmoid

and clipping (clamp) the weights on the discriminator to a range <math>[-c, c]</math>.

However, weight clipping leads to other issues which limit the critic.

Instead of clipping, WGAN-GP proposes gradient penalty to enforce 1-Lipschitz .

===Progressive Growing of GANs (ProGAN)===

[https://arxiv.org/abs/1710.10196 Paper]

Progressively add layers to the generator and the discriminator of the GAN.

At the beginning, the generator makes a 4x4 image and the discriminator takes input the 4x4 image.

Then, another layer is faded in the generator and the discriminator for and 8x8 image,...

===Stacked Generative Adversarial Networks===

[https://arxiv.org/abs/1612.04357 Paper]

==Applications==

===CycleGan===

===InfoGAN===

===SinGAN===

[https://arxiv.org/abs/1905.01164 Paper]

[https://arxiv.org/abs/1905.01164 Paper]

[http://webee.technion.ac.il/people/tomermic/SinGAN/SinGAN.htm Website]

[https://github.com/tamarott/SinGAN Github Official PyTorch Implementation]

SinGAN: Learning a Generative Model from a Single Natural Image

===MoCoGAN===

[https://arxiv.org/abs/1707.04993 Paper]

MoCoGAN: Decomposing Motion and Content for Video Generation

===Video Prediction===

* [http://openaccess.thecvf.com/content_iccv_2017/html/Liang_Dual_Motion_GAN_ICCV_2017_paper.html Dual Motion GAN (Liang et al. 2017)]

** Have a frame generator and a motion generator

** Combine the outputs of both generators using a fusing layer

** Trained using a frame discriminator and a motion discriminator. (Each generator are trained with both discriminators)

===Image and Video Compression===

* [http://openaccess.thecvf.com/content_ICCV_2019/html/Agustsson_Generative_Adversarial_Networks_for_Extreme_Learned_Image_Compression_ICCV_2019_paper.html Image Compression]

* [https://arxiv.org/pdf/1912.10653.pdf Video compression via colorization]

** Colorize with GAN. Only transmit luminance (Y of YUV)

** The paper claims 72% BDBR reduction compared to HM 16.0.

===Object Segmentation===

* [https://arxiv.org/abs/1905.11369 Object Discovery with a Copy-Pasting GAN]

===StyleGAN===

==Important Papers==

===Latent Space Exploration===

* [https://arxiv.org/abs/1907.10786 Interpreting the Latent Space of GANs for Semantic Face Editing]

* [https://arxiv.org/abs/1907.07171 On the "steerability" of generative adversarial networks]

** Exploring which directions in the latent space control high-level features such as camera position, object rotation, object hue,...

====Inversion====

How to go from an image to a latent space vector

* [https://arxiv.org/abs/1904.03189 Image2StyleGAN]

** Mostly showing off applications using StyleGAN: morphing, style transfer, expression transfer

** Invert StyleGAN to get style vectors <math>w</math> but with a different style vector per layer.

** Able to get StyleGAN trained on faces to output cats, dogs, cars, ...

** Followup Papers: [https://arxiv.org/abs/1911.11544 Image2StyleGAN++] adds Activation Tensor Manipulation

===Activation Tensor Manipulation===

* [https://arxiv.org/abs/1811.10597 GAN Dissection: Visualizing and Understanding Generative Adversarial Networks]

** Authors: David Bau

** Basically, individual "units" or channels of the intermediate representations correspond to some features like windows or trees in the output

** Dissection: Identifying which units correspond to features can be done by visualizing each channel as a heatmap. Then threshold the heatmap so each value is binary 0/1. Calculate the IOU between the heatmap and the segmented feature in the generated picture.

** Intervention: By zeroing out channels, you can remove windows or trees from the generated image. Alternatively you can add windows or trees at specific locations by activating the neurons at that location of the corresponding window/tree channel.

** This is fairly specific to CNN architectures where there is a locality correspondence between the intermediate representations and the output image.

** Followup Papers: [https://dl.acm.org/doi/abs/10.1145/3306346.3323023 Semantic photo manipulation]

==Resources==

* [https://github.com/soumith/ganhacks Tricks for Training GANs]

@@ Line 2: / Line 2: @@
 Goal: Learn to generate examples from the same distribution as your training set.
-==Basis Structure==
+==Structure==
-GANs consist of a generator and a discriminator.
+GANs consist of a generator and a discriminator, both of which are usually CNNs.
 <pre>
 For iteration i
    For iteration j
-     Update Generator
+     Update Discriminator
-   Update Discriminator
+   Update Generator
 </pre>
+===Generator===
+Two popular types of CNNs used in GANs are Resnets and UNets.<br>
+In both cases, we have convolutional blocks which consist of a conv2d layer, a batch norm, and an activation (typically Relu or leakyrelu).
+===Discriminator===
+A popular discriminator is the PatchGAN discriminator.<br>
+These are typically several convolutional blocks stacked together.
+Each convolutional layer in the conv block typically has a kernel size of (3x3) or (4x4) and a stride of 1-2.
 ==Variations==
+===Conditional GAN===
+[https://arxiv.org/abs/1411.1784 Paper]<br>
+Feed data y to both generator and discriminator
+===Wasserstein GAN===
+[https://arxiv.org/pdf/1704.00028.pdf Paper]<br>
+[https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490 Medium post]<br>
+This new WGAN-GP loss function improves the stability of training.<br>
+Normally, the discriminator is trained with a cross-entropy with sigmoid loss function.<br>
+The WGAN proposes using Wasserstein distance which is implemented by removing the cross-entropy+sigmoid
+and clipping (clamp) the weights on the discriminator to a range <math>[-c, c]</math>.<br>
+However, weight clipping leads to other issues which limit the critic.<br>
+Instead of clipping, WGAN-GP proposes gradient penalty to enforce 1-Lipschitz .
+===Progressive Growing of GANs (ProGAN)===
+[https://arxiv.org/abs/1710.10196 Paper]<br>
+Progressively add layers to the generator and the discriminator of the GAN.<br>
+At the beginning, the generator makes a 4x4 image and the discriminator takes input the 4x4 image.
+Then, another layer is faded in the generator and the discriminator for and 8x8 image,...
+===Stacked Generative Adversarial Networks===
+[https://arxiv.org/abs/1612.04357 Paper]<br>
+==Applications==
 ===CycleGan===
 ===InfoGAN===
 ===SinGAN===
-[https://arxiv.org/abs/1905.01164 Paper]
+{{ main | SinGAN}}
+[https://arxiv.org/abs/1905.01164 Paper]<br>
+[http://webee.technion.ac.il/people/tomermic/SinGAN/SinGAN.htm Website]<br>
+[https://github.com/tamarott/SinGAN Github Official PyTorch Implementation]<br>
+SinGAN: Learning a Generative Model from a Single Natural Image<br>
+===MoCoGAN===
+{{ main | MoCoGAN}}
+[https://arxiv.org/abs/1707.04993 Paper]<br>
+MoCoGAN: Decomposing Motion and Content for Video Generation<br>
+===Video Prediction===
+* [http://openaccess.thecvf.com/content_iccv_2017/html/Liang_Dual_Motion_GAN_ICCV_2017_paper.html Dual Motion GAN (Liang et al. 2017)]
+** Have a frame generator and a motion generator
+** Combine the outputs of both generators using a fusing layer
+** Trained using a frame discriminator and a motion discriminator. (Each generator are trained with both discriminators)
+===Image and Video Compression===
+* [http://openaccess.thecvf.com/content_ICCV_2019/html/Agustsson_Generative_Adversarial_Networks_for_Extreme_Learned_Image_Compression_ICCV_2019_paper.html Image Compression]
+* [https://arxiv.org/pdf/1912.10653.pdf Video compression via colorization]
+** Colorize with GAN. Only transmit luminance (Y of YUV)
+** The paper claims 72% BDBR reduction compared to HM 16.0.
+===Object Segmentation===
+* [https://arxiv.org/abs/1905.11369 Object Discovery with a Copy-Pasting GAN]
+===StyleGAN===
+{{ main | StyleGAN }}
+==Important Papers==
+===Latent Space Exploration===
+* [https://arxiv.org/abs/1907.10786 Interpreting the Latent Space of GANs for Semantic Face Editing]
+* [https://arxiv.org/abs/1907.07171 On the "steerability" of generative adversarial networks]
+** Exploring which directions in the latent space control high-level features such as camera position, object rotation, object hue,...
+====Inversion====
+How to go from an image to a latent space vector
+* [https://arxiv.org/abs/1904.03189 Image2StyleGAN]
+** Mostly showing off applications using StyleGAN: morphing, style transfer, expression transfer
+** Invert StyleGAN to get style vectors <math>w</math> but with a different style vector per layer.
+** Able to get StyleGAN trained on faces to output cats, dogs, cars, ...
+** Followup Papers: [https://arxiv.org/abs/1911.11544 Image2StyleGAN++] adds Activation Tensor Manipulation
+===Activation Tensor Manipulation===
+* [https://arxiv.org/abs/1811.10597 GAN Dissection: Visualizing and Understanding Generative Adversarial Networks]
+** Authors: David Bau
+** Basically, individual "units" or channels of the intermediate representations correspond to some features like windows or trees in the output
+** Dissection: Identifying which units correspond to features can be done by visualizing each channel as a heatmap. Then threshold the heatmap so each value is binary 0/1. Calculate the IOU between the heatmap and the segmented feature in the generated picture.
+** Intervention: By zeroing out channels, you can remove windows or trees from the generated image. Alternatively you can add windows or trees at specific locations by activating the neurons at that location of the corresponding window/tree channel.
+** This is fairly specific to CNN architectures where there is a locality correspondence between the intermediate representations and the output image.
+** Followup Papers: [https://dl.acm.org/doi/abs/10.1145/3306346.3323023 Semantic photo manipulation]
+==Resources==
+* [https://github.com/soumith/ganhacks Tricks for Training GANs]