5,332
edits
Line 1,545: | Line 1,545: | ||
===Practical Domain Adaptation Methods=== | ===Practical Domain Adaptation Methods=== | ||
;Classical DA methods (pre deep learning) | |||
* Metric Learning | |||
* Subspace alignment | |||
* MMD-based distribution matching | |||
* Sample reweighting & selection | |||
;Modern DA methods (i.e. deep domain adaption) | |||
[https://arxiv.org/abs/1409.7495 Ganin & Lempitsky] | |||
The idea is to train an embedding function using an adversarial domain classifier to extract common features from the source and target domains. | |||
* Input <math>x</math> goes to an embedding function <math>F</math> to get features. | |||
* Features go to a classification network <math>C_1</math> to get labels. | |||
* Features also go to a domain classifier <math>C_2</math>. | |||
* Training: <math>\min_{F, C_1, C_2} E_{(x, y)} \sim Q_{X,Y}^{(m)} [\ell(C_1 \circ F(x), y)] - \lambda L(C_2)</math>. | |||
* In general, we want to find a mapping (embedding) <math>F</math> such that <math>F(Q_X) \approx F(P_X)</math>. | |||
*: The domain classifier penalizes the distance between <math>F(Q_X)</math> and <math>F(P_X)</math>. | |||
Example 1: | |||
MMD distance (Maximum mean discrepancy) | |||
Define <math>\tilde{x}_i = F(x_i)</math>. | |||
<math>D_{MMD}(Q^{(m)}_{\tilde{x}}, P^{(m)}_{\tilde{x}}) \stackrel{\triangle}{=} \Vert \frac{1}{m}\sum \phi(\tilde{x}_i^S) - \frac{1}{m}\sum \phi(\tilde{x}_i^T) \Vert</math> | |||
Here <math>\phi: \mathbb{R}^r \to \mathbb{R}^D</math> is a fixed kernel function. | |||
We square D to apply the kernel trick: | |||
<math>D^2_{MMD}(Q^{(m)}_{\tilde{x}}, P^{(m)}_{\tilde{x}}) = \frac{1}{m^2}\left( \sum_{i,j=1}^{m}K(x_i^s x_j^s) + \sum_{i,j=1}^{m}K(x_i^t, x_j^t) - 2 \sum_{i,j=1}^{m}K(x_i^t, x_j^t) \right)</math> | |||
MMD-based DA (Tzeng ''et al.'' 2014): | |||
<math>\min L(c_1 \circ F(x^s, y^s) + \lambda D^s_{MMD}(F(x^s, F(x^t))</math> | |||
==Misc== | ==Misc== |