Deep Learning: Difference between revisions

2,917 bytes added ,  24 September 2020
Line 623: Line 623:
==Adversarial Robustness==
==Adversarial Robustness==
Beginning of Lecture 8 (Sept. 24 2020)
Beginning of Lecture 8 (Sept. 24 2020)
In standard ERM, we have a dataset: 
<math>\{(x_i, y_i)\}_{i=1}^{n}</math> where <math>x_i \in \mathbb{R}^d</math> and <math>y_i \in [c]</math> 
We optimize <math>\min_{\theta} E \left[ l(f_{\theta}(x), y) \right] = \frac{1}{n} \sum l(f_{\theta}(x_i), y_i) </math>.
;Q: What is an ''adversarial example''?
<math>x'</math> is an adversarial example for <math>x</math> under a model <math>f_{\theta}()</math> if
* <math>f_{\theta}(x) \neq f_{\theta}(x')</math> and
* <math>f_{human}(x) = f_{human}(x)</math>
One key challenge is that we don't have a precise definition of human perception.
===Formulation===
Threat Models: <math>x' \in T(x, \epsilon)</math> 
Example: Lp model <math>\{x' | \Vert x' - x \Vert_{lp} \leq \epsilon }</math>
Non-Lp threat models: 
One problem with Lp models is that two identical images can have large L2 difference. For example, shifting or rotating an image. 
An example of a non-Lp threat model is using Wasserstein difference.
===Attacks===
The goal is given <math>x</math> and <math>f_{\theta}()</math>, we want to compute <math>x'</math> such that
<math>c* = f_{\theta}(x) \neq f_{\theta}(x')</math>.
* Targeted: Given a target label <math>t \neq c^*</math>, <math>f_{\theta}(x') = t</math>.
* Untargeted: <math>c^*</math> is not given
We mostly focus on targeted attacks.
* White-box: Adversary knows everything
* Black-box: Adversary has limited knowledge
We mostly focus on white-box attacks.
* Inference-time (evasion) attacks
* Training-time (poison) attacks
We focus on Inference-time attacks.
Attack Optimization: 
<math>\max_{\delta} l_{cls}(f_{\theta}(x'), y)</math> such that <math>\Vert x - x' \Vert < \epsilon</math> 
How to solve this optimization?
====Fast Gradient Sign Method (FGSM)====
For <math>p=\infty</math>: 
<math>
\begin{aligned}
\delta_{FGSM} &= \max_{\Vert \delta \Vert \leq \epsilon} <\nabla l(f_{\theta}(x), y), \delta>\\
&= \epsilon \sign(\nabla l(f_{\theta}(x), y))
\end{aligned}
</math>
Pros:
* Fast because it is a one-step attack.
====Projected Gradient Descent (PGD)====
This is an iterative attack using gradient descent.
# Initalize <math>x^(0) = x</math> 
# At each iteration, calculate <math>\tilde{x}^{(t+1)}  = x^{(t)} + \eta g^{(t)}</math>. 
# Then project back into the ball <math>x^{(t+1)} = \pi(\tilde{x}^{(t+1)}</math>.
#* For <math>l_{\infty}</math>, just do a element-wise clamp: <math>x^{(t+1)} = clip(\tilde{x}^{(t+1)}, x - \epsilon, x + \epsilon)</math>
In general, PGD > FGSM.
===Alternative formulation===
<math>\min \Vert \delta \Vert</math> such that <math>f_{\theta}(x + \delta) = t \neq c^*</math> 
Lagrange dual form: 
<math>\min \Vert \delta \Vert - \lambda l_{cls}(f_{\theta}(x + \delta, y=c^*)</math>
===Adversarial Training===
<math>\min_{\theta} E \left[ \max_{x'} l_{cls}(f_{\theta}(x'), y) \right]</math> 
How to solve it?
* Inner max = attack problem (PGD)
* Outer max = gradient descent


==Misc==
==Misc==