5,337
edits
Line 637: | Line 637: | ||
===Formulation=== | ===Formulation=== | ||
Threat Models: <math>x' \in T(x, \epsilon)</math> | Threat Models: <math>x' \in T(x, \epsilon)</math> | ||
Example: Lp model <math>\{x' | \Vert x' - x \Vert_{lp} \leq \epsilon }</math> | Example: Lp model <math>\{x' | \Vert x' - x \Vert_{lp} \leq \epsilon \}</math> | ||
Non-Lp threat models: | Non-Lp threat models: | ||
Line 695: | Line 695: | ||
* Inner max = attack problem (PGD) | * Inner max = attack problem (PGD) | ||
* Outer max = gradient descent | * Outer max = gradient descent | ||
===Other heuristic/emperical defenses=== | |||
Beginning of Lecture 9, Sept. 29 2020 | |||
ICLR 2018: 6-7 empirical defenses | |||
Athalye et al (ICML 2018) show that obfuscated/masked gradients provide a ''false'' sense of security. | |||
Such defenses can be ''easily'' broken using ''adaptive'' attacks. | |||
Obfuscated gradients provide shattered gradients (e.g. non-differentiable op) or stochastic gradients (e.g. randomized network). | |||
False assumptions: | |||
* One step attack > iterative attack | |||
* Black-box at tack > white-box attack | |||
* Unbounded attack -> success rate not equals | |||
* Increasing attack budget doesn't increase the success rate | |||
;How to attack defenses using gradient masking? | |||
* Defense based on adding a ''non-differentiable'' operator: | |||
Example 1: | |||
<math>\hat{f}(x) = f(g(x))</math> with g non-diff and non smooth. | |||
In the attack, just use g(x) | |||
Example 2: | |||
Defense uses ''randomized'' classifier or stochastic gradients | |||
Just take the expectation over the randomization. | |||
;Are adversarial examples inevitable? | |||
On sphere: <math>d_{\infty}(x, x') \leq d_{2}(x, x') \leq d_{g}(x, x')</math>. | |||
Classification problem: <math>\{1,..., C\} = [c]</math> labels. | |||
Each class has a density function <math>\rho_{c}</math> which is bounded. | |||
;Isoperimetric Inequality | |||
Of all closed surfaces that encloses a unit volume, the sphere has the smallest surface. | |||
Very intuitive but difficult to prove. (Osserman et al 1976) | |||
;Lemma (Lerg & Pellegrino 1951) | |||
Consider a subset <math>A \subset S^{d-1}</math> | |||
;Lemma (Milman & Schechtman 1986) | |||
<math>A(\epsilon) \geq 1 - (\frac{\pi}{8})^{1/2} \exp(-\frac{d-1}{2} \epsilon^2)</math> | |||
As d goes to infinity, the right hand side goes to 1. | |||
;Theorem | |||
c-class classification | |||
<math>\rho_c, u_c, v_c=\delta_{n-1} * u_c</math> | |||
<math>f_c = \mu_1 \{x \mid c(x)=c\}</math> | |||
Pick a class <math>f_c \leq \frac{1}{2}</math>. | |||
Sample x from <math>\rho_c</math>. | |||
Either x is misclassified or x has an <math>\epsilon</math>-adversarial example. | |||
One of these will happen with probability <math>1-v_c (\frac{\pi}{8})^{1/2} \exp^{- ((d-1)/2) \epsilon^2}</math><br> | |||
Proof: | |||
Consider the region with the correct classification: <math>R=\{x \mid c(x)=c</math>. Here <math>u(R) = f_c \leq 1/2</math>. | |||
Consider the compliment <math>R^c</math>. <math>u_1(R^c) \geq \frac{1}{2}</math>. | |||
;Lemma | |||
<math>A \subset [0, 1]^d</math> with <math>vol(A) \geq \frac{1}{2}</math> | |||
<math>vol(A(\epsilon g d p)) \geq 1 - \frac{\exp(-2 \pi d^{1-(2/p^2)\epsilon^2})}{2 \pi \epsilon d^{1/2 - 1/p^2}}</math> | |||
For l2 p>=2, <math>p \geq 2</math>, <math>vol(A(\epsilon g d p)) \geq 1 - \frac{\exp(-2 \pi \epsilon^2)}{2 \pi \epsilon}</math> | |||
For p=2, the diameter of the hypercube is <math>O(\sqrt{d})</math>. | |||
For p=infinity, the diameter of the cube is 1. | |||
This shows if you pick a random sample, there is a high probability of it being misclassified or there being an adversarial example within epsilon. | |||
;Are adversarial examples inevitable in practice? | |||
This is an ill-posed question. | |||
It depends on the data distribution, threat model, and hypothesis class. | |||
==Misc== | ==Misc== |