5,323
edits
Line 623: | Line 623: | ||
==Adversarial Robustness== | ==Adversarial Robustness== | ||
Beginning of Lecture 8 (Sept. 24 2020) | Beginning of Lecture 8 (Sept. 24 2020) | ||
In standard ERM, we have a dataset: | |||
<math>\{(x_i, y_i)\}_{i=1}^{n}</math> where <math>x_i \in \mathbb{R}^d</math> and <math>y_i \in [c]</math> | |||
We optimize <math>\min_{\theta} E \left[ l(f_{\theta}(x), y) \right] = \frac{1}{n} \sum l(f_{\theta}(x_i), y_i) </math>. | |||
;Q: What is an ''adversarial example''? | |||
<math>x'</math> is an adversarial example for <math>x</math> under a model <math>f_{\theta}()</math> if | |||
* <math>f_{\theta}(x) \neq f_{\theta}(x')</math> and | |||
* <math>f_{human}(x) = f_{human}(x)</math> | |||
One key challenge is that we don't have a precise definition of human perception. | |||
===Formulation=== | |||
Threat Models: <math>x' \in T(x, \epsilon)</math> | |||
Example: Lp model <math>\{x' | \Vert x' - x \Vert_{lp} \leq \epsilon }</math> | |||
Non-Lp threat models: | |||
One problem with Lp models is that two identical images can have large L2 difference. For example, shifting or rotating an image. | |||
An example of a non-Lp threat model is using Wasserstein difference. | |||
===Attacks=== | |||
The goal is given <math>x</math> and <math>f_{\theta}()</math>, we want to compute <math>x'</math> such that | |||
<math>c* = f_{\theta}(x) \neq f_{\theta}(x')</math>. | |||
* Targeted: Given a target label <math>t \neq c^*</math>, <math>f_{\theta}(x') = t</math>. | |||
* Untargeted: <math>c^*</math> is not given | |||
We mostly focus on targeted attacks. | |||
* White-box: Adversary knows everything | |||
* Black-box: Adversary has limited knowledge | |||
We mostly focus on white-box attacks. | |||
* Inference-time (evasion) attacks | |||
* Training-time (poison) attacks | |||
We focus on Inference-time attacks. | |||
Attack Optimization: | |||
<math>\max_{\delta} l_{cls}(f_{\theta}(x'), y)</math> such that <math>\Vert x - x' \Vert < \epsilon</math> | |||
How to solve this optimization? | |||
====Fast Gradient Sign Method (FGSM)==== | |||
For <math>p=\infty</math>: | |||
<math> | |||
\begin{aligned} | |||
\delta_{FGSM} &= \max_{\Vert \delta \Vert \leq \epsilon} <\nabla l(f_{\theta}(x), y), \delta>\\ | |||
&= \epsilon \sign(\nabla l(f_{\theta}(x), y)) | |||
\end{aligned} | |||
</math> | |||
Pros: | |||
* Fast because it is a one-step attack. | |||
====Projected Gradient Descent (PGD)==== | |||
This is an iterative attack using gradient descent. | |||
# Initalize <math>x^(0) = x</math> | |||
# At each iteration, calculate <math>\tilde{x}^{(t+1)} = x^{(t)} + \eta g^{(t)}</math>. | |||
# Then project back into the ball <math>x^{(t+1)} = \pi(\tilde{x}^{(t+1)}</math>. | |||
#* For <math>l_{\infty}</math>, just do a element-wise clamp: <math>x^{(t+1)} = clip(\tilde{x}^{(t+1)}, x - \epsilon, x + \epsilon)</math> | |||
In general, PGD > FGSM. | |||
===Alternative formulation=== | |||
<math>\min \Vert \delta \Vert</math> such that <math>f_{\theta}(x + \delta) = t \neq c^*</math> | |||
Lagrange dual form: | |||
<math>\min \Vert \delta \Vert - \lambda l_{cls}(f_{\theta}(x + \delta, y=c^*)</math> | |||
===Adversarial Training=== | |||
<math>\min_{\theta} E \left[ \max_{x'} l_{cls}(f_{\theta}(x'), y) \right]</math> | |||
How to solve it? | |||
* Inner max = attack problem (PGD) | |||
* Outer max = gradient descent | |||
==Misc== | ==Misc== |