5,321
edits
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
An adversarial example tries to trick a neural network by applying a small worst-case perturbation to a real example. | An adversarial example tries to trick a neural network by applying a small worst-case perturbation to a real example.<br> | ||
These were also introduced by Ian Goodfellow | These were also introduced by Ian Goodfellow.<br> | ||
The first two papers introducing adversarial examples are: | |||
* [https://arxiv.org/abs/1412.6572 Explaining and Harnessing Adversarial Examples] by Ian Goodfellow et al. in 2014 | |||
* [https://arxiv.org/abs/1312.6199 Intriguing properties of neural networks] by Szegedy et al. in 2014 | |||
==Attacks== | ==Attacks== | ||
===L-BFGS=== | |||
Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)<br> | |||
This is used by Szegedy et al in their paper. | |||
===Fast Gradient Sign Method=== | ===Fast Gradient Sign Method=== | ||
The fast gradient sign method (FGSM) | The fast gradient sign method (FGSM) uses the sign of the gradient times a unit vector as the perturbation.<br> | ||
This was proposed by Ian Goodfellow in his paper.<br> | |||
===Projected Gradient Descent=== | ===Projected Gradient Descent=== | ||
Basic idea: Do gradient descent. If you go too far from your example, project it back into your perturbation range. | Basic idea: Do gradient descent. If you go too far from your example, project it back into your perturbation range.<br> | ||
This was proposed by Madry et al. in their 2017 paper [https://arxiv.org/abs/1706.06083 Towards Deep Learning Models Resistant to Adversarial Attacks].<br> | |||
==Defenses== | ==Defenses== | ||
Line 14: | Line 24: | ||
Interval Bound Propagation (IBP)<br> | Interval Bound Propagation (IBP)<br> | ||
[https://arxiv.org/abs/1810.12715 A paper] | [https://arxiv.org/abs/1810.12715 A paper] | ||
==NLP== | |||
* [https://arxiv.org/abs/1901.06796 Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey] | |||
===Benchmark Datasets=== | |||
====Text classification==== | |||
Semantic Analysis, gender identification, grammer error detection |