SinGAN: Learning a Generative Model from a Single Natural Image

\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

SinGAN Paper
Website
Github Official PyTorch Implementation
SinGAN: Learning a Generative Model from a Single Natural Image


Basic Idea

Bootstrap patches of the original image and build GANs which can add fine details to blurry patches at different path sizes.

  • Start by building a GAN to generate low-resolution versions of the original image
  • Then upscale the image and build a GAN to add details to patches of your upscaled image
  • Fix the parameters of the previous GAN. Upscale the outputs and repeat.


Architecture

They build \(\displaystyle N\) GANs.
Each GAN \(\displaystyle G_n\) adds details to patches of the image produced by GAN \(\displaystyle G_{n+1}\) below it.
The final GAN \(\displaystyle G_0\) adds only fine details.

Generator

Discriminator

Training and Loss Function

They use a combination of the standard GAN adversarial loss and a reconstruction loss.

Reconstruction Loss

\(\displaystyle L_{rec} = \Vert G_n(0,(\bar{x}^{rec}_{n+1}\uparrow^r) - x_n \Vert ^2\)
The reconstruction loss ensures that the original image can be built by the GAN.
Rather than inputting noise to the generators, they input \(\displaystyle \{z_N^{rec}, z_{N-1}^{rec}, ..., z_0^{rec}\} = \{z^*, 0, ..., 0\}\) where the initial noise \(\displaystyle z^*\) is drawn once and then fixed during the rest of the training.