Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 16:17, 27 October 2020

1,527 bytes added , 27 October 2020

→‎Domain Adaptation

David

Bureaucrats, Interface administrators, Administrators

5,332

edits

@@ Line 1,489: / Line 1,489: @@
 In training:
 For the source domain, we have labeled samples <math>\{(x_i^S, y_i^S)\}_{i=1}^{m_S} \sim Q_{X,Y}</math>.
 For the target domain, we only have unlabeled samples <math>\{x_i^t\} \sim P_{X}</math>.
+This is ''unsupervised'' domain adaptation.
+In ''semi-supervised'' domain adaptation, your target samples are mostly unlabeled but contain a few labeled samples.
+If no target samples are available during training, the scenario is called ''domain generalization'' or ''out-of-distribution'' generalization.
+===Unsupervised domain adaptation===
+Given <math>m_s = m_t = m</math>.
+* m labeled samples from source domain Q
+* m unlabeled samples from target P
+* This problem is ill-defined
+;Practical assumptions.
+# Covariate shifts: P and Q satisfy the covariate shift assumption if the conditional label dist doesn't change between source and target.
+#* I.e. <math>P(y|x) = Q(y|x)</math>
+# Similarity of source and target marginal distributions.
+# If I had labeled target samples, the join error (target + source samples) should be small.
+[Ben-David et al.] consider binary classification.
+;H-divergence
+<math>2 sup_{h \in H}|Pr_{x \sim Q_{X}}(h(x)=1) - Pr_{x \sim P_{X}}(h(x)=1)| = d_{H}(Q_X, P_X)</math>
+;Lemma
+<math>d_{H}(Q_X, P_X)</math> can be estimated by m samples from source and target domains.
+<math>VC(H)=d</math> then with probability <math>1-\delta</math>
+<math>d_{H}(Q_X, P_X) \leq d_{H}(Q_{X}^{(m)}, P_{X}^{(m)}) + 4 \sqrt{\frac{d \log(2m) + \log(2/\delta)}{m}}</math>
+# Label source examples as 1 and label target samples as 0.
+# Train a classifier to classify source and target.
+# If loss is small then divergence is large. <math>\frac{1}{2} d_{H}(Q_X^{(m)}, P_X^{(m)}) = 1-loss_{class}</math>
 ==Misc==