Jump to content

Deep Learning: Difference between revisions

2,878 bytes added ,  10 November 2020
Line 1,689: Line 1,689:
==Self-supervised Learning==
==Self-supervised Learning==
Lecture 21 (November 10, 2020)
Lecture 21 (November 10, 2020)
Given data <math>x</math>, we want to find a good representation <math>f(x)</math>. 
We can use <math>f(x)</math>. to solve the classification problem more efficiently (e.g. using linear classifiers).
Task 1: Learn a good <math>f(x)</math> from ''unlabeled'' samples. 
Task 2: Use <math>f(x)</math> + a few labels to solve classification problem using linear models.
Note that in semi-supervised learning, you have unlabeled examples and a few labelled examples but you know what the task is. 
In self-supervised learning, we use ''structure'' in labelled data to create artificial supervised learning problems solved via deep models. 
In this process, the learning method ''hopefully'' will create internal representations for the data <math>f(x)</math> useful for downstream tasks.
===Image embedding===
Surprising observation for image embedding: 
[Gidaris ''et al.'' ICLR 2018] + [Zhang et al. 2019] 
# Rotate images and use the angle of rotation as labels (e.g. <math>\theta = 0, 90, 180, 270</math>).
# Train a CNN to predict the rotation angle from images.
# Use <math>f(x)</math> with linear classification models for the true labels.
;Why should <math>f(x)</math> be a good representation for images?
This is an open question.
===Contrastive Learning===
[Logeswaren & Lee ICLR 2018] use a text corpus (Wikipedia) to train deep representation <math>f(x)</math>:  
<math>x, x^{+}</math> are adjacent sentences. 
<math>x, x^{-}</math> are random sentences.
Optimization:
<math>\min_{f} E[\log(1 + \exp^{f(x)^T f(x^-) - f(x)^T f(x^+)]})] \approx E[f(x)^T f(x^-) - f(x)^T f(x^+)]</math>.
This is known as contrastive learning.
Sentence embeddings capture human notions of similarities: 
E.g. the tiger rules this jungle is similar to a lion hunts in a forest.
;Can we use contrastive learning to obtain power data representations for images? 
We need pairs of similar images and dissimilar images.
SimCLR [Chen ''et al.'' 2020]
# Create two correlated views of an image <math>x</math>: <math>\tilde{x}_i</math> and <math>\tilde{x}_j</math>.
#* Random cropping + resize
#* Random color distortion
#* Random Gaussian blur
# Use base encoder (ResNet) to map <math>\tilde{x}_i,\tilde{x}_j</math> to embeddings <math>h_i, h_j</math>.
# Train a project head <math>g()</math> (one hidden layer MLP) to map h's to z's which maximize the agreement between z's.
# Loss function: <math>sim(z_i, z_j) = \frac{z_i^t z_j}{\Vert z_i \Vert \Vert z_j \Vert}</math>
Randomly select <math>N</math> samples and add their augmentations to get 2N samples. 
Compute similarity matrix <math>S \in \mathbb{R}^{2N \times 2N}</math>. 
<math>S_{ij}=\exp(sim(z_i, z_j)) =
\begin{cases}
1 & \text{if }i=j\\
high-number & \text{if }{j=i+1}\\
low & otherwise
\end{cases}
</math>
Training is <math>\min_{f,g} L = \frac{1}{N} \sum_{k=1}^{N} \frac{l(2k-1,2k) + l(2k, 2k-1)}{2}</math>.


==Misc==
==Misc==