Visual Learning and Recognition: Difference between revisions

Line 1,058: Line 1,058:
===Pixel-RNN/CNN===
===Pixel-RNN/CNN===
* Fully-visible belief network
* Fully-visible belief network
* Each pixel depends on it's adjacent pixels
* Explicit density model:
* Training:
** Each pixel depends on all previous pixels
** Decompose likelihood
** <math>P_{\theta}(x) = \prod_{i=1}^{n} P_{\theta}(x_i | x_1, ..., x_{i-1})</math>
** <math>P_{\theta}(x) = \prod_{i=1}^{n} P_{\theta}(x_i | x_1, ..., x_{i-1})</math>
** You need to define what is ''previous pixels'' (e.g. all pixels above and left)
* Then maximize likelihood of training data


;Pros:
;Pros:
Line 1,070: Line 1,071:
* Sequence generation is slow
* Sequence generation is slow
* Optimizing P(x) is hard.
* Optimizing P(x) is hard.
Types of ''previous pixels'' connections:
* PixelCNN looks at all previous pixels (fastest)
* Row LSTM has a triangular receptive field (slow)
* Diagonal LSTM
* Diagonal BiLSTM has a full dependency field (slowest)
;Multi-scale PixelRNN
* Takes subsampled pixels as additional input pixels
* Can capture better global information
* Slightly better results
===Generative Adversarial Networks (GANs)===
* Generator generates images
* Discriminator classifies real or fake
* Loss: <math>\min_{G} \max_{D} E_x[\log D(x)] + E_z[\log(1-D(G(z)))]</math>
;Image-to-image Conditional GANS
* Add an image encoder which outputs z
;pix2pix
* Add L1 loss to the loss function
* UNet generator
* PatchGAN discriminator
** PatchGAN outputs N*N values with real-fake with each patch (i.e. limited receptive field)
* Requires paired samples
;CycleGAN
* Unpaired image-to-image translation
* Cycle-consistency loss


==Will be on the exam==
==Will be on the exam==