5,337
edits
Line 905: | Line 905: | ||
x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2 | x_j^{(p)} = \operatorname{argmax}_{x} \Vert g(x) - g(x_t) \Vert_{l2}^{2} + \beta \Vert x - x_j^{(b)} \Vert _{l2}^2 | ||
</math> | </math> | ||
;Do these attacks actually work? | |||
Schwarz Schild ''et al.'' test on multiple datasets and find that these attack do not really work. | |||
The attacks are heavily dependent on the particular setup. | |||
* Feature Collision attack: they assume g is trained on adam but the success rate is bad for SGD. | |||
* Data augmentation: Success rates falls for different model. | |||
* For black-box attacks, success rate reduces | |||
* Success rate also depends on the size of the dataset. | |||
===Provable defenses against general poison attacks=== | |||
Levine and Feizi (2020) | |||
Consider a ''general'' poisoning attack where the attacker can insert or remove samples from the training set. | |||
We measure the '''attack magnitude''' by the symmetric difference between the clean and poisoned sets. | |||
Symmetric difference is defined as <math>A \ominus B = (A \setminus B) \cup (B \setminus A)</math>. | |||
Last lecture, we had provable defenses against ''sparse'' inference time attacks using randomized ablation. | |||
==Misc== | ==Misc== |