5,337
edits
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
* Then upscale the image and build a GAN to add details to patches of your upscaled image | * Then upscale the image and build a GAN to add details to patches of your upscaled image | ||
* Fix the parameters of the previous GAN. Upscale the outputs and repeat. | * Fix the parameters of the previous GAN. Upscale the outputs and repeat. | ||
==Architecture== | ==Architecture== | ||
Line 21: | Line 20: | ||
The final GAN <math>G_0</math> adds only fine details. | The final GAN <math>G_0</math> adds only fine details. | ||
===Generator=== | ===Generator=== | ||
The use N generators.<br> | |||
Each generator consists of 5 convolutional blocks:<br> | |||
Conv(</math>3 \times 3</math>)-BatchNorm-LeakyReLU.<br> | |||
They use 32 kernels per block at the coarsest scale and increase <math>2 \times</math> every 4 scales. | |||
===Discriminator=== | ===Discriminator=== | ||
The architecture is the same as the generator.<br> | |||
The patch size is <math>11 \times 11</math> | |||
==Training and Loss Function== | ==Training and Loss Function== | ||
<math>\min_{G_n} \max_{D_n} \mathcal{L}_{adv}(G_n, D_n) + \alpha \mathcal{L}_{rec}(G_n)</math><br> | <math>\min_{G_n} \max_{D_n} \mathcal{L}_{adv}(G_n, D_n) + \alpha \mathcal{L}_{rec}(G_n)</math><br> | ||
They use a combination of the standard GAN adversarial loss and a reconstruction loss. | They use a combination of the standard GAN adversarial loss and a reconstruction loss. | ||
===Adversarial Loss=== | |||
They use the [https://arxiv.org/abs/1704.00028 WGAN-GP loss].<br> | |||
The final loss is the average over all the patches.<br> | |||
===Reconstruction Loss=== | ===Reconstruction Loss=== | ||
Line 32: | Line 42: | ||
Rather than inputting noise to the generators, they input | Rather than inputting noise to the generators, they input | ||
<math>\{z_N^{rec}, z_{N-1}^{rec}, ..., z_0^{rec}\} = \{z^*, 0, ..., 0\}</math> | <math>\{z_N^{rec}, z_{N-1}^{rec}, ..., z_0^{rec}\} = \{z^*, 0, ..., 0\}</math> | ||
where the initial noise <math>z^*</math> is drawn once and then fixed during the rest of the training. | where the initial noise <math>z^*</math> is drawn once and then fixed during the rest of the training.<br> | ||
The standard deviation <math>\sigma_n</math> of the noise <math>z_n</math> is proportional to the root mean squared error (RMSE) between the reconstructed patch and the original patch. | |||
==Evaluation== | |||
They evaluate their method using an Amazon Mechanical Turk (AMT) user study and using Single Image Frechet Inception Distance | |||
===Amazon Mechanical Turk Study=== | |||
===Frechet Inception Distance=== | |||
==Results== | |||
Below are images of their results from their paper and website. | |||