5,337
edits
No edit summary |
|||
Line 783: | Line 783: | ||
;Theorem (Cohen et al., 2019) | ;Theorem (Cohen et al., 2019) | ||
No adversarial example exists within the radius: | No adversarial example exists within the radius: | ||
<math>\frac{\sigma}{2}\left(\Phi^{-1}(p_1(x))-\Phi^{-1}(p_2(x))\right)</math> | <math>\frac{\sigma}{2}\left(\Phi^{-1}(p_1(x))-\Phi^{-1}(p_2(x))\right)</math> | ||
The proof is based on Neyman & Pearson lemma. | |||
;Theorem (Levine, Singla, F2019, Salman et al 2019) | |||
<math>\Phi^{-1}(\bar{f}(x)) is Lipschitz with constant <math>1/\sigma</math> | |||
The worst g is a stepwise function. Then <math>\Phi^{-1}(\bar{g})</math> is a linear function. | |||
For L2 attacks, you can use Gaussian noise. For L1 attacks, you can use Laplace noise. | |||
;Theorem (KLGF, ICML 2020) | |||
Using any symmetric i.i.d. smoothing, | |||
<math>r_{p}^* \leq \frac{\sigma}{2 \sqrt{2} d^{1/2 - 1/p}}\left(\frac{1}{\sqrt{1-p_1(x)}} + \frac{1}{\sqrt{p_2(x)}}\right)</math> | |||
If we use Gaussian smoothing against Lp attacks, we get: | |||
<math>r_p = \frac{\sigma}{2d^{1/2 - 1/p}}\left( \Sigma^{-1}(p_1(x)) - \Sigma^{-1}(p_2(x)) \right)</math> | |||
This shows that Gaussian smoothing is optimal (up to a constant) within i.i.d. smoothing distributions against Lp attacks. | |||
===Sparse Threat=== | |||
Here the adversary can change up to <math>\rho</math> pixels in the image. | |||
The idea is to classify each example based on only k random pixels in the image. This is performed several times and the a voting scheme determines the final label. | |||
;Theorem (Levine, F. AAAI 2020) | |||
For inputs <math>x</math> and <math>x'</math> with <math>\Vert x - x' \Vert_{l_0} \leq \rho</math>, for all i | |||
<math>\vert p_i(x) - p_i(x')\vert \leq \delta</math> where <math>\delta = 1 - \frac{\binom{d-\rho}{k}}{\binom{d}{k}}</math>. | |||
Robustness vs Accuracy Trade-off: | |||
Increasing <math>k</math> boosts classification accuracy but also increases <math>\Delta</math>. | |||
===Relationship between Threat Models=== | |||
Use a neural perceptual threat model to approximate the true perceptual distance. | |||
Use LPIPS as <math>d_{neural}(x, x') = \Vert \phi(x) - \phi(x') \Vert</math> where <math>\phi</math> are normalized feature maps. | |||
Our attack optimization is now: | |||
<math> | |||
\begin{aligned} | |||
\max_{x'} &l_{cls}(f(x'), y)\\ | |||
& d_{neural}(x, x') \leq \rho | |||
\end{aligned} | |||
</math> | |||
From this, we get Perceptual Projected Gradient Descent (PPGD) and Lagrangian Perceptual Attacks (LPA). | |||
We also get Perceptual Adversarial Training (PAT). | |||
==Misc== | ==Misc== |