5,337
edits
Line 859: | Line 859: | ||
In this case, an adversary can do ''data poisoning'' by perturbing some of the training samples. | In this case, an adversary can do ''data poisoning'' by perturbing some of the training samples. | ||
;Question: What is the goal? | ;Question: What is the goal of data poisoning? | ||
* To reduce the test time accuracy? | |||
** Simple to mitigate by looking at the performance for validation set. | |||
* Targeted misclassification: to cause one or more target samples to be misclassified as another class. | |||
*: '''This is what we focus on''' | |||
Given clean/base images: <math>\{(x_i, y_i)\}_{i=1}^{n}</math>, create poison images <math>\{(x_i^{(P)},y_i^{(P)}\}_{i=1}^{J}</math>. | |||
Our new training set is: <math>\{(x_i, y_i)\}_{i=1}^{n} \cup \{(x_i^{(P)},y_i^{(P)}\}_{i=1}^{J}</math>. | |||
We train using SGD on some model <math>f</math>. | |||
The goal is to make <math>f(x_t)</math> produce the wrong label. | |||
===Naive attack=== | |||
<math>x_t</math> is a cat. Our goal is that <math>f(x_t)</math> is dog. | |||
One way is to add multiple examples of <math>x_t</math> with the wrong label, dog. | |||
For a sufficiently large model, it will predict a dog image. | |||
This is called ''flooding'' the training set. | |||
This is not too concerning because a simple filtering or outlier detection can identify poison samples. | |||
Two types of attacks | |||
* backdoor attacks | |||
* triggerless attacks | |||
===Backdoor attacks=== | |||
The idea is too add a ''trigger'' or watermark to the image to make it misclassify it. | |||
Gu ''et al'' (2017) <ref name="gu2017badnets"> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''. | |||
==Misc== | ==Misc== |