Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 17:15, 10 November 2020

2,878 bytes added , 10 November 2020

→‎Self-supervised Learning

David

Bureaucrats, Interface administrators, Administrators

5,332

edits

@@ Line 1,689: / Line 1,689: @@
 ==Self-supervised Learning==
 Lecture 21 (November 10, 2020)
+Given data <math>x</math>, we want to find a good representation <math>f(x)</math>.
+We can use <math>f(x)</math>. to solve the classification problem more efficiently (e.g. using linear classifiers).
+Task 1: Learn a good <math>f(x)</math> from ''unlabeled'' samples.
+Task 2: Use <math>f(x)</math> + a few labels to solve classification problem using linear models.
+Note that in semi-supervised learning, you have unlabeled examples and a few labelled examples but you know what the task is.
+In self-supervised learning, we use ''structure'' in labelled data to create artificial supervised learning problems solved via deep models.
+In this process, the learning method ''hopefully'' will create internal representations for the data <math>f(x)</math> useful for downstream tasks.
+===Image embedding===
+Surprising observation for image embedding:
+[Gidaris ''et al.'' ICLR 2018] + [Zhang et al. 2019]
+# Rotate images and use the angle of rotation as labels (e.g. <math>\theta = 0, 90, 180, 270</math>).
+# Train a CNN to predict the rotation angle from images.
+# Use <math>f(x)</math> with linear classification models for the true labels.
+;Why should <math>f(x)</math> be a good representation for images?
+This is an open question.
+===Contrastive Learning===
+[Logeswaren & Lee ICLR 2018] use a text corpus (Wikipedia) to train deep representation <math>f(x)</math>:
+<math>x, x^{+}</math> are adjacent sentences.
+<math>x, x^{-}</math> are random sentences.
+Optimization:
+<math>\min_{f} E[\log(1 + \exp^{f(x)^T f(x^-) - f(x)^T f(x^+)]})] \approx E[f(x)^T f(x^-) - f(x)^T f(x^+)]</math>.
+This is known as contrastive learning.
+Sentence embeddings capture human notions of similarities:
+E.g. the tiger rules this jungle is similar to a lion hunts in a forest.
+;Can we use contrastive learning to obtain power data representations for images?
+We need pairs of similar images and dissimilar images.
+SimCLR [Chen ''et al.'' 2020]
+# Create two correlated views of an image <math>x</math>: <math>\tilde{x}_i</math> and <math>\tilde{x}_j</math>.
+#* Random cropping + resize
+#* Random color distortion
+#* Random Gaussian blur
+# Use base encoder (ResNet) to map <math>\tilde{x}_i,\tilde{x}_j</math> to embeddings <math>h_i, h_j</math>.
+# Train a project head <math>g()</math> (one hidden layer MLP) to map h's to z's which maximize the agreement between z's.
+# Loss function: <math>sim(z_i, z_j) = \frac{z_i^t z_j}{\Vert z_i \Vert \Vert z_j \Vert}</math>
+Randomly select <math>N</math> samples and add their augmentations to get 2N samples.
+Compute similarity matrix <math>S \in \mathbb{R}^{2N \times 2N}</math>.
+<math>S_{ij}=\exp(sim(z_i, z_j)) =
+\begin{cases}
+& \text{if }i=j\\
+high-number & \text{if }{j=i+1}\\
+low & otherwise
+\end{cases}
+</math>
+Training is <math>\min_{f,g} L = \frac{1}{N} \sum_{k=1}^{N} \frac{l(2k-1,2k) + l(2k, 2k-1)}{2}</math>.
 ==Misc==