SinGAN: Learning a Generative Model from a Single Natural Image: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 14: Line 14:
* Then upscale the image and build a GAN to add details to patches of your upscaled image
* Then upscale the image and build a GAN to add details to patches of your upscaled image
* Fix the parameters of the previous GAN. Upscale the outputs and repeat.
* Fix the parameters of the previous GAN. Upscale the outputs and repeat.


==Architecture==
==Architecture==
Line 21: Line 20:
The final GAN <math>G_0</math> adds only fine details.
The final GAN <math>G_0</math> adds only fine details.
===Generator===
===Generator===
The use N generators.<br>
Each generator consists of 5 convolutional blocks:<br>
Conv(</math>3 \times 3</math>)-BatchNorm-LeakyReLU.<br>
They use 32 kernels per block at the coarsest scale and increase <math>2 \times</math> every 4 scales.
===Discriminator===
===Discriminator===
The architecture is the same as the generator.<br>
The patch size is <math>11 \times 11</math>


==Training and Loss Function==
==Training and Loss Function==
<math>\min_{G_n} \max_{D_n} \mathcal{L}_{adv}(G_n, D_n) + \alpha \mathcal{L}_{rec}(G_n)</math><br>
<math>\min_{G_n} \max_{D_n} \mathcal{L}_{adv}(G_n, D_n) + \alpha \mathcal{L}_{rec}(G_n)</math><br>
They use a combination of the standard GAN adversarial loss and a reconstruction loss.
They use a combination of the standard GAN adversarial loss and a reconstruction loss.
===Adversarial Loss===
They use the [https://arxiv.org/abs/1704.00028 WGAN-GP loss].<br>
The final loss is the average over all the patches.<br>


===Reconstruction Loss===
===Reconstruction Loss===
Line 32: Line 42:
Rather than inputting noise to the generators, they input  
Rather than inputting noise to the generators, they input  
<math>\{z_N^{rec}, z_{N-1}^{rec}, ..., z_0^{rec}\} = \{z^*, 0, ..., 0\}</math>
<math>\{z_N^{rec}, z_{N-1}^{rec}, ..., z_0^{rec}\} = \{z^*, 0, ..., 0\}</math>
where the initial noise <math>z^*</math> is drawn once and then fixed during the rest of the training.
where the initial noise <math>z^*</math> is drawn once and then fixed during the rest of the training.<br>
The standard deviation <math>\sigma_n</math> of the noise <math>z_n</math> is proportional to the root mean squared error (RMSE) between the reconstructed patch and the original patch.
 
==Evaluation==
They evaluate their method using an Amazon Mechanical Turk (AMT) user study and using Single Image Frechet Inception Distance
===Amazon Mechanical Turk Study===
===Frechet Inception Distance===
 
 
==Results==
Below are images of their results from their paper and website.