5,321
edits
(Created page with "An adversarial example tries to trick a neural network by applying a small worst-case perturbation to a real example.") |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
An adversarial example tries to trick a neural network by applying a small worst-case perturbation to a real example. | An adversarial example tries to trick a neural network by applying a small worst-case perturbation to a real example.<br> | ||
These were also introduced by Ian Goodfellow.<br> | |||
The first two papers introducing adversarial examples are: | |||
* [https://arxiv.org/abs/1412.6572 Explaining and Harnessing Adversarial Examples] by Ian Goodfellow et al. in 2014 | |||
* [https://arxiv.org/abs/1312.6199 Intriguing properties of neural networks] by Szegedy et al. in 2014 | |||
==Attacks== | |||
===L-BFGS=== | |||
Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)<br> | |||
This is used by Szegedy et al in their paper. | |||
===Fast Gradient Sign Method=== | |||
The fast gradient sign method (FGSM) uses the sign of the gradient times a unit vector as the perturbation.<br> | |||
This was proposed by Ian Goodfellow in his paper.<br> | |||
===Projected Gradient Descent=== | |||
Basic idea: Do gradient descent. If you go too far from your example, project it back into your perturbation range.<br> | |||
This was proposed by Madry et al. in their 2017 paper [https://arxiv.org/abs/1706.06083 Towards Deep Learning Models Resistant to Adversarial Attacks].<br> | |||
==Defenses== | |||
Most defenses focus on generating adversarial examples during training time and training on those adversarial examples.<br> | |||
Below are some alternatives to this approach. | |||
===Interval Bound Propagation=== | |||
Interval Bound Propagation (IBP)<br> | |||
[https://arxiv.org/abs/1810.12715 A paper] | |||
==NLP== | |||
* [https://arxiv.org/abs/1901.06796 Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey] | |||
===Benchmark Datasets=== | |||
====Text classification==== | |||
Semantic Analysis, gender identification, grammer error detection |