Deep Learning: Difference between revisions

No edit summary
Line 783: Line 783:
;Theorem (Cohen et al., 2019)
;Theorem (Cohen et al., 2019)
No adversarial example exists within the radius:
No adversarial example exists within the radius:
<math>\frac{\sigma}{2}\left(\Phi^{-1}(p_1(x))-\Phi^{-1}(p_2(x))\right)</math>
<math>\frac{\sigma}{2}\left(\Phi^{-1}(p_1(x))-\Phi^{-1}(p_2(x))\right)</math>
The proof is based on Neyman & Pearson lemma.
 
;Theorem (Levine, Singla, F2019, Salman et al 2019)
<math>\Phi^{-1}(\bar{f}(x)) is Lipschitz with constant <math>1/\sigma</math>
 
The worst g is a stepwise function. Then <math>\Phi^{-1}(\bar{g})</math> is a linear function.
 
For L2 attacks, you can use Gaussian noise. For L1 attacks, you can use Laplace noise.
 
;Theorem (KLGF, ICML 2020)
Using any symmetric i.i.d. smoothing,
<math>r_{p}^* \leq \frac{\sigma}{2 \sqrt{2} d^{1/2 - 1/p}}\left(\frac{1}{\sqrt{1-p_1(x)}} + \frac{1}{\sqrt{p_2(x)}}\right)</math>
 
If we use Gaussian smoothing against Lp attacks, we get:
<math>r_p = \frac{\sigma}{2d^{1/2 - 1/p}}\left( \Sigma^{-1}(p_1(x)) - \Sigma^{-1}(p_2(x)) \right)</math> 
This shows that Gaussian smoothing is optimal (up to a constant) within i.i.d. smoothing distributions against Lp attacks. 
 
===Sparse Threat===
Here the adversary can change up to <math>\rho</math> pixels in the image.
 
The idea is to classify each example based on only k  random pixels in the image. This is performed several times and the a voting scheme determines the final label.
 
;Theorem (Levine, F. AAAI 2020)
For inputs <math>x</math> and <math>x'</math> with <math>\Vert x - x' \Vert_{l_0} \leq \rho</math>, for all i
<math>\vert p_i(x) - p_i(x')\vert \leq \delta</math> where <math>\delta = 1 - \frac{\binom{d-\rho}{k}}{\binom{d}{k}}</math>.
 
Robustness vs Accuracy Trade-off: 
Increasing <math>k</math> boosts classification accuracy but also increases <math>\Delta</math>.
 
===Relationship between Threat Models===
Use a neural perceptual threat model to approximate the true perceptual distance. 
Use LPIPS as <math>d_{neural}(x, x') = \Vert \phi(x) - \phi(x') \Vert</math> where <math>\phi</math> are normalized feature maps. 
Our attack optimization is now:
<math>
\begin{aligned}
\max_{x'} &l_{cls}(f(x'), y)\\
& d_{neural}(x, x') \leq \rho
\end{aligned}
</math>
 
From this, we get Perceptual Projected Gradient Descent (PPGD) and Lagrangian Perceptual Attacks (LPA). 
We also get Perceptual Adversarial Training (PAT).


==Misc==
==Misc==