Deep Learning: Difference between revisions

Line 885: Line 885:
The idea is too add a ''trigger'' or watermark to the image to make it misclassify it.
The idea is too add a ''trigger'' or watermark to the image to make it misclassify it.


Gu ''et al'' (2017) <ref name="gu2017badnets"></ref> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''.
Gu ''et al'' (2017) <ref name="gu2017badnets"></ref> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''.
However this is not a clean attack because you need to change the labels.
 
Turnet ''et al.'' craft clean-label backdoor attacks. 
Here they take examples <math>x_j^{(b)}</math> (e.g. airplane) and apply an adversarial perturbation to get <math>\tilde{x}_j^{(b)}</math>. 
The adversarial perturbation is obtained by training a network <math>g</math>. 
By the ''transferability'' of adversarial attacks, a new network <math>f</math> is likely to output a wrong label. 
Then they add a trigger to the image. 
Pros: You can use the trigger to poison several examples. 
Cons: There is a trigger.
 
===Triggerless poison attacks===
Shafahi ''et al'' introduce feature collisions. 
Suppose f is the feature layer and g is a pretrained network.
Suppose x_t is a cat and g classified cats and airplanes. 
The idea is to apply adversarial perturbation to some base image to be close to the target in the feature space. 
If we train the model on the poisoned samples, the decision boundary is going to fold.
<math>
x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2
</math>


==Misc==
==Misc==