Deep Learning: Difference between revisions

Line 905: Line 905:
x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2
x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2
</math>
</math>
;Do these attacks actually work?
Schwarz Schild ''et al.'' test on multiple datasets and find that these attack do not really work. 
The attacks are heavily dependent on the particular setup. 
* Feature Collision attack: they assume g is trained on adam but the success rate is bad for SGD.
* Data augmentation: Success rates falls for different model.
* For black-box attacks, success rate reduces
* Success rate also depends on the size of the dataset.
===Provable defenses against general poison attacks===
Levine and Feizi (2020) 
Consider a ''general'' poisoning attack where the attacker can insert or remove samples from the training set. 
We measure the '''attack magnitude''' by the symmetric difference between the clean and poisoned sets. 
Symmetric difference is defined as <math>A \ominus B = (A \setminus B) \cup (B \setminus A)</math>.
Last lecture, we had provable defenses against ''sparse'' inference time attacks using randomized ablation.


==Misc==
==Misc==