5,337
edits
Line 885: | Line 885: | ||
The idea is too add a ''trigger'' or watermark to the image to make it misclassify it. | The idea is too add a ''trigger'' or watermark to the image to make it misclassify it. | ||
Gu ''et al'' (2017) <ref name="gu2017badnets"></ref> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''. | Gu ''et al'' (2017) <ref name="gu2017badnets"></ref> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''. | ||
However this is not a clean attack because you need to change the labels. | |||
Turnet ''et al.'' craft clean-label backdoor attacks. | |||
Here they take examples <math>x_j^{(b)}</math> (e.g. airplane) and apply an adversarial perturbation to get <math>\tilde{x}_j^{(b)}</math>. | |||
The adversarial perturbation is obtained by training a network <math>g</math>. | |||
By the ''transferability'' of adversarial attacks, a new network <math>f</math> is likely to output a wrong label. | |||
Then they add a trigger to the image. | |||
Pros: You can use the trigger to poison several examples. | |||
Cons: There is a trigger. | |||
===Triggerless poison attacks=== | |||
Shafahi ''et al'' introduce feature collisions. | |||
Suppose f is the feature layer and g is a pretrained network. | |||
Suppose x_t is a cat and g classified cats and airplanes. | |||
The idea is to apply adversarial perturbation to some base image to be close to the target in the feature space. | |||
If we train the model on the poisoned samples, the decision boundary is going to fold. | |||
<math> | |||
x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2 | |||
</math> | |||
==Misc== | ==Misc== |