Adversarial Machine Learning Reading List

Writing

by Nicholas Carlini 2018-07-15 [last updated 2019-11-26]

From time to time I receive emails asking how to get started studying adversarial machine learning. Below is the list of papers I recommend reading to become familiar with the specific sub-field of evasion attacks on machine learning systems (i.e., adversarial examples).

Alternatively, you may be interested in seeing an (unfiltered) list of all 1000+ adversarial example papers.

There are three versions of this list:

The just-the-basics list: a collection of five papers that briefly summarize the field. You won't be doing any new research from this, but you'll understand what people mean when they say they study adversarial examples.
The quick-introduction list: the ~10 most important papers to read to get a solid grounding in the field of adversarial examples in machine learning.
The complete-background list: the full list, containing all of the papers that anyone who wants to perform neural network evaluations should read. The papers are split by topic and indicated which topics should be read before others.

Just The Basics Quick Introduction Complete Background

Preliminary Papers

Evasion Attacks against Machine Learning at Test Time
Intriguing properties of neural networks
Explaining and Harnessing Adversarial Examples

Attacks and Defenses

Towards Evaluating the Robustness of Neural Networks
Towards Deep Learning Models Resistant to Adversarial Attacks

Preliminary Papers

Evasion Attacks against Machine Learning at Test Time
Intriguing properties of neural networks
Explaining and Harnessing Adversarial Examples

Attacks (1)

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Towards Evaluating the Robustness of Neural Networks

Defenses

Towards Deep Learning Models Resistant to Adversarial Attacks
Certified Robustness to Adversarial Examples with Differential Privacy

Attacks (2)

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models
Synthesizing Robust Adversarial Examples
Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Preliminary Papers

Evasion Attacks against Machine Learning at Test Time
Intriguing properties of neural networks
Explaining and Harnessing Adversarial Examples

Attacks [requires Preliminary Papers]

The Limitations of Deep Learning in Adversarial Settings
DeepFool: a simple and accurate method to fool deep neural networks
Towards Evaluating the Robustness of Neural Networks

Transferability [requires Preliminary Papers]

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Delving into Transferable Adversarial Examples and Black-box Attacks
Universal adversarial perturbations

Detecting Adversarial Examples [requires Attacks, Transferability]

On Detecting Adversarial Perturbations
Detecting Adversarial Samples from Artifacts
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Restricted Threat Model Attacks [requires Attacks]

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models
Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors

Physical-World Attacks [reqires Attacks, Transferability]

Adversarial examples in the physical world
Synthesizing Robust Adversarial Examples
Robust Physical-World Attacks on Deep Learning Models

Verification [requires Introduction]

Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Defenses (2) [requires Detecting]

Towards Deep Learning Models Resistant to Adversarial Attacks
Certified Robustness to Adversarial Examples with Differential Privacy

Attacks (2) [requires Defenses (2)]

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Defenses (3) [requires Attacks (2)]

Towards the first adversarially robust neural network model on MNIST
On Evaluating Adversarial Robustness

Other Domains [requires Attacks]

Adversarial Attacks on Neural Network Policies
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Adversarial examples for generative models

There's also an RSS Feed if that's more of your thing.