Visual Learning and Recognition: Difference between revisions

Visual Learning and Recognition (view source)

Revision as of 18:45, 24 November 2020

1,020 bytes added , 24 November 2020

→‎GANs and VAEs

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 1,058: / Line 1,058: @@
 ===Pixel-RNN/CNN===
 * Fully-visible belief network
-* Each pixel depends on it's adjacent pixels
+* Explicit density model:
-* Training:
+** Each pixel depends on all previous pixels
-** Decompose likelihood
 ** <math>P_{\theta}(x) = \prod_{i=1}^{n} P_{\theta}(x_i | x_1, ..., x_{i-1})</math>
+** You need to define what is ''previous pixels'' (e.g. all pixels above and left)
+* Then maximize likelihood of training data
 ;Pros:
@@ Line 1,070: / Line 1,071: @@
 * Sequence generation is slow
 * Optimizing P(x) is hard.
+Types of ''previous pixels'' connections:
+* PixelCNN looks at all previous pixels (fastest)
+* Row LSTM has a triangular receptive field (slow)
+* Diagonal LSTM
+* Diagonal BiLSTM has a full dependency field (slowest)
+;Multi-scale PixelRNN
+* Takes subsampled pixels as additional input pixels
+* Can capture better global information
+* Slightly better results
+===Generative Adversarial Networks (GANs)===
+* Generator generates images
+* Discriminator classifies real or fake
+* Loss: <math>\min_{G} \max_{D} E_x[\log D(x)] + E_z[\log(1-D(G(z)))]</math>
+;Image-to-image Conditional GANS
+* Add an image encoder which outputs z
+;pix2pix
+* Add L1 loss to the loss function
+* UNet generator
+* PatchGAN discriminator
+** PatchGAN outputs N*N values with real-fake with each patch (i.e. limited receptive field)
+* Requires paired samples
+;CycleGAN
+* Unpaired image-to-image translation
+* Cycle-consistency loss
 ==Will be on the exam==