Jump to content

Deep Learning: Difference between revisions

1,666 bytes added ,  29 October 2020
Line 1,545: Line 1,545:


===Practical Domain Adaptation Methods===
===Practical Domain Adaptation Methods===
;Classical DA methods (pre deep learning)
* Metric Learning
* Subspace alignment
* MMD-based distribution matching
* Sample reweighting & selection
;Modern DA methods (i.e. deep domain adaption)
[https://arxiv.org/abs/1409.7495 Ganin & Lempitsky] 
The idea is to train an embedding function using an adversarial domain classifier to extract common features from the source and target domains.
* Input <math>x</math> goes to an embedding function <math>F</math> to get features. 
* Features go to a classification network <math>C_1</math> to get labels.
* Features also go to a domain classifier <math>C_2</math>.
* Training: <math>\min_{F, C_1, C_2} E_{(x, y)} \sim Q_{X,Y}^{(m)} [\ell(C_1 \circ F(x), y)] - \lambda L(C_2)</math>.
* In general, we want to find a mapping (embedding) <math>F</math> such that <math>F(Q_X) \approx F(P_X)</math>.
*: The domain classifier penalizes the distance between <math>F(Q_X)</math> and <math>F(P_X)</math>.
Example 1: 
MMD distance (Maximum mean discrepancy) 
Define <math>\tilde{x}_i = F(x_i)</math>. 
<math>D_{MMD}(Q^{(m)}_{\tilde{x}}, P^{(m)}_{\tilde{x}}) \stackrel{\triangle}{=} \Vert \frac{1}{m}\sum \phi(\tilde{x}_i^S) - \frac{1}{m}\sum \phi(\tilde{x}_i^T) \Vert</math> 
Here <math>\phi: \mathbb{R}^r \to \mathbb{R}^D</math> is a fixed kernel function. 
We square D to apply the kernel trick: 
<math>D^2_{MMD}(Q^{(m)}_{\tilde{x}}, P^{(m)}_{\tilde{x}}) = \frac{1}{m^2}\left( \sum_{i,j=1}^{m}K(x_i^s x_j^s) + \sum_{i,j=1}^{m}K(x_i^t, x_j^t) - 2 \sum_{i,j=1}^{m}K(x_i^t, x_j^t) \right)</math> 
MMD-based DA (Tzeng ''et al.'' 2014): 
<math>\min L(c_1 \circ F(x^s, y^s) + \lambda D^s_{MMD}(F(x^s, F(x^t))</math>


==Misc==
==Misc==