Deep Learning: Difference between revisions

Line 859: Line 859:
In this case, an adversary can do ''data poisoning'' by perturbing some of the training samples.
In this case, an adversary can do ''data poisoning'' by perturbing some of the training samples.


;Question: What is the goal?
;Question: What is the goal of data poisoning?
* To reduce the test time accuracy?
** Simple to mitigate by looking at the performance for validation set.
* Targeted misclassification: to cause one or more target samples to be misclassified as another class.
*: '''This is what we focus on'''
 
Given clean/base images: <math>\{(x_i, y_i)\}_{i=1}^{n}</math>, create poison images <math>\{(x_i^{(P)},y_i^{(P)}\}_{i=1}^{J}</math>.
 
Our new training set is: <math>\{(x_i, y_i)\}_{i=1}^{n} \cup \{(x_i^{(P)},y_i^{(P)}\}_{i=1}^{J}</math>. 
We train using SGD on some model <math>f</math>. 
The goal is to make <math>f(x_t)</math> produce the wrong label.
 
===Naive attack===
<math>x_t</math> is a cat. Our goal is that <math>f(x_t)</math> is dog. 
One way is to add multiple examples of <math>x_t</math> with the wrong label, dog. 
For a sufficiently large model, it will predict a dog image. 
This is called ''flooding'' the training set. 
This is not too concerning because a simple filtering or outlier detection can identify poison samples.
 
Two types of attacks
* backdoor attacks
* triggerless attacks
 
===Backdoor attacks===
The idea is too add a ''trigger'' or watermark to the image to make it misclassify it.
 
Gu ''et al'' (2017) <ref name="gu2017badnets"> randomly select a small portion of training set, apply a backdoor trigger, and ''change the label to the target label''.


==Misc==
==Misc==