Deep Learning: Difference between revisions

Deep Learning (view source)

Revision as of 16:13, 29 October 2020

1,380 bytes added , 29 October 2020

→‎Practical Domain Adaptation Methods

David

Bureaucrats, Interface administrators, Administrators

5,337

edits

@@ Line 1,564: / Line 1,564: @@
 *: The domain classifier penalizes the distance between <math>F(Q_X)</math> and <math>F(P_X)</math>.
-Example 1:
+Example 1: MMD distance (Maximum mean discrepancy)
-MMD distance (Maximum mean discrepancy)
 Define <math>\tilde{x}_i = F(x_i)</math>.
 <math>D_{MMD}(Q^{(m)}_{\tilde{x}}, P^{(m)}_{\tilde{x}}) \stackrel{\triangle}{=} \Vert \frac{1}{m}\sum \phi(\tilde{x}_i^S) - \frac{1}{m}\sum \phi(\tilde{x}_i^T) \Vert</math>
@@ Line 1,573: / Line 1,572: @@
 MMD-based DA (Tzeng ''et al.'' 2014):
 <math>\min L(c_1 \circ F(x^s, y^s) + \lambda D^s_{MMD}(F(x^s, F(x^t))</math>
+Example 2: Wasserstein distance
+<math>\min L_{cls}(C_1 \circ F(x^S), y^s) + \lambda W(F(x^s, F(x^s))</math>
+The wasserstein distance is computed using the Kantorovich duality.
+This is also called IPM (Integral prob. metrics) distance.
+* We can also use improved & robust version of Wasserstein in DA.
+** Robust Wasserstein [Balaji ''et al.'' Neurips 20]
+** Normalized Wasserstein [....ICCV]
+===CycleGAN===
+[Zhu et al 2017]
+Another approach for image-to-image translation.
+Source: <math>(x^s, y^s)</math>
+Target: <math>x^t</math>
+Train two functions: <math>G_{S \to T}</math> and <math>G_{T \to S}</math>.
+Losses:
+* <math>L_{GAN}(x^S, x^T, G_{S\to T}(D^T) = E_{x^t}\left[\log D^T(x^t)\right] + E\left[\log(1-D^T(G_{S\ to T}(x^s)) \right]</math>.
+* <math>L_{GAN}(x^S, x^T, G_{T \to S}(D^S)</math>
+* Cycle consistency: <math>L_{cyc} = E\left[ \Vert G_{T\to S}(G_{S \to T}(x^s)) - x^s \Vert \right] + E \left[ \Vert G_{S \to T}(G_{T\ to S}(x^t)) - x^t \Vert \right]</math>
+Other tricks:
+* Domain-specific batch norms
+* Entropy based regularization
+===Are assumptions necessary?===
+Assumptions:
+* Covariate shift
+* <math>d_H(Q_x, P_x)</math> is small
+* <math>\epsilon_{joint}</math> small
+See [Ben-David ''et al.''].
+* Covariate shift assumption is not sufficient.
+* Necessity of small <math>d_{H}(P,Q)</math> for DA.
+* Necessity of small joint training error.
 ==Misc==