Training Generative Adversarial Networks with Limited Data

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Training Generative Adversarial Networks with Limited Data (Neurips 2020)

This is a modification of StyleGAN2 by the same authors at Nvidia.

Authors: Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila Affiliations: NVIDIA

The core idea is to use data augmentation when training the discriminator so that you can train GANs in a stable way without \(\displaystyle 10^5\) number of images. They are able to train using only a few thousand images.

Method

During training of the discriminator, they apply some augmentations to the images to prevent overfitting.
The amount of augmentations are adaptive based on a heuristic for overfitting.

Augmentations
  • Pixel blitting (x-flips, 90-deg rotations, integer translation)
  • geometric transformations
  • color transforms
  • image-space filtering
  • additive noise
  • cutout

Adaptive discriminator augmentation

There are two heuristics which can be used to estimate overfitting:

  • \(\displaystyle r_v = \frac{E[D_{train}] - E[D_{validation}]}{E[D_{train}] - E[D_{generated}]}\)
  • \(\displaystyle r_t = E[\operatorname{sign}(D_{train})]\)

The heuristic \(\displaystyle r_v\) requires a validation set whereas \(\displaystyle r_t\) does not. They primarily use \(\displaystyle r_t\). Here, \(\displaystyle r=0\) indicates no overfitting whereas \(\displaystyle r=1\) is complete overfitting.