Deep Learning: Difference between revisions

← Older edit Newer edit →

@@ Line 623: / Line 623: @@
 ==Adversarial Robustness==
 Beginning of Lecture 8 (Sept. 24 2020)
+In standard ERM, we have a dataset:
+<math>\{(x_i, y_i)\}_{i=1}^{n}</math> where <math>x_i \in \mathbb{R}^d</math> and <math>y_i \in [c]</math>
+We optimize <math>\min_{\theta} E \left[ l(f_{\theta}(x), y) \right] = \frac{1}{n} \sum l(f_{\theta}(x_i), y_i) </math>.
+;Q: What is an ''adversarial example''?
+<math>x'</math> is an adversarial example for <math>x</math> under a model <math>f_{\theta}()</math> if
+* <math>f_{\theta}(x) \neq f_{\theta}(x')</math> and
+* <math>f_{human}(x) = f_{human}(x)</math>
+One key challenge is that we don't have a precise definition of human perception.
+===Formulation===
+Threat Models: <math>x' \in T(x, \epsilon)</math>
+Example: Lp model <math>\{x' | \Vert x' - x \Vert_{lp} \leq \epsilon }</math>
+Non-Lp threat models:
+One problem with Lp models is that two identical images can have large L2 difference. For example, shifting or rotating an image.
+An example of a non-Lp threat model is using Wasserstein difference.
+===Attacks===
+The goal is given <math>x</math> and <math>f_{\theta}()</math>, we want to compute <math>x'</math> such that
+<math>c* = f_{\theta}(x) \neq f_{\theta}(x')</math>.
+* Targeted: Given a target label <math>t \neq c^*</math>, <math>f_{\theta}(x') = t</math>.
+* Untargeted: <math>c^*</math> is not given
+We mostly focus on targeted attacks.
+* White-box: Adversary knows everything
+* Black-box: Adversary has limited knowledge
+We mostly focus on white-box attacks.
+* Inference-time (evasion) attacks
+* Training-time (poison) attacks
+We focus on Inference-time attacks.
+Attack Optimization:
+<math>\max_{\delta} l_{cls}(f_{\theta}(x'), y)</math> such that <math>\Vert x - x' \Vert < \epsilon</math>
+How to solve this optimization?
+====Fast Gradient Sign Method (FGSM)====
+For <math>p=\infty</math>:
+<math>
+\begin{aligned}
+\delta_{FGSM} &= \max_{\Vert \delta \Vert \leq \epsilon} <\nabla l(f_{\theta}(x), y), \delta>\\
+&= \epsilon \sign(\nabla l(f_{\theta}(x), y))
+\end{aligned}
+</math>
+Pros:
+* Fast because it is a one-step attack.
+====Projected Gradient Descent (PGD)====
+This is an iterative attack using gradient descent.
+# Initalize <math>x^(0) = x</math>
+# At each iteration, calculate <math>\tilde{x}^{(t+1)}  = x^{(t)} + \eta g^{(t)}</math>.
+# Then project back into the ball <math>x^{(t+1)} = \pi(\tilde{x}^{(t+1)}</math>.
+#* For <math>l_{\infty}</math>, just do a element-wise clamp: <math>x^{(t+1)} = clip(\tilde{x}^{(t+1)}, x - \epsilon, x + \epsilon)</math>
+In general, PGD > FGSM.
+===Alternative formulation===
+<math>\min \Vert \delta \Vert</math> such that <math>f_{\theta}(x + \delta) = t \neq c^*</math>
+Lagrange dual form:
+<math>\min \Vert \delta \Vert - \lambda l_{cls}(f_{\theta}(x + \delta, y=c^*)</math>
+===Adversarial Training===
+<math>\min_{\theta} E \left[ \max_{x'} l_{cls}(f_{\theta}(x'), y) \right]</math>
+How to solve it?
+* Inner max = attack problem (PGD)
+* Outer max = gradient descent
 ==Misc==