I am a research scientist at Google DeepMind (formerly at Google Brain) working at the intersection of machine learning and computer security. My most recent line of work studies properties of neural networks from an adversarial perspective. I received my Ph.D. from UC Berkeley in 2018, and my B.A. in computer science and mathematics (also from UC Berkeley) in 2013.
Generally, I am interested in developing attacks on machine learning systems; most of my work develops attacks demonstrating security and privacy risks of these systems. I have received best paper awards at USENIX Security, IEEE S&P, and ICML, and my work has been featured in the New York Times, the BBC, Nature Magazine, Science Magazine, Wired, and Popular Science.
Selected Recent Work
Earlier this year I introduced a recent paper of ours developing the first practical poisoning attack on large-scale machine learning models. With our attack I could have poisoned the training dataset for anyone who has used LAION-400M (or other popular datasets) in the last six months. Our attack is trivial: I bought expired domains corressponding to URLs in popular image datasets. This gave us control over 0.01% of each of these datasets. In this talk (given at the Stanford MLSys seminar) discuss how the attack works, the consequences of this attack, and potential defenses. More broadly, we hope machine learning researchers will study other simple but practical attacks on the machine learning pipeline.
In 2021, at USENIX Security, I presented a paper that was the result of a massive collaboration with ten co-authors to measure the privacy of large language models. It's been academically known for quite some time that if you train a machine learning model on a sensitive dataset, it's mathematically possible that releasing the model could violate the privacy of the users from the training data. But this has remained mostly something theory people say could happen, because math says so. In this paper we show that large language models actually do leak individual training examples from datasets they were trained on. To do this we show that given query access to GPT-2, it's possible to recover hundreds of training datapoints including PII, random numbers, and URLs from leaked email dumps.
At CRYPTO'20, I presented a paper I wrote with Matthew Jagielski and Ilya Mironov that introduces an improved model stealing attack. Given query access to a remote neural network, we are able to extract out an almost identical copy of the parameters, layer-by-layer, one at a time. For models we extract, we cam prove that the stolen copy is identical up to 30 bits of precision with respect to the original model. (If you're a ML person, you might want to skip the background, where I explain to the crypto audience what a fully connected neural network is.)
At CAMLIS 2019 I gave a talk covering what it means to evaluate adversarial robustness. This is a much higher-level talk for an audience that isn't deeply familiar with the area of adversarial machine learning research. (For a more technical version of this talk, see my recent USENIX Security invited talk that discusses these same topics in more depth.) The talk covers what adversarial examples are, how to generate them, how to (try to) defend against them, and finally what the future may hold.
At ICML 2018, I presented a paper I wrote with Anish Athalye and my advisor David Wagner: Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In this paper, we demonstrate that most of the ICLR'18 adversarial example defenses were, in fact, ineffective at defending against attack and in fact just broke existing attack algorithms. We introduce stronger attacks that work in the presence of what we call “obfuscated gradients”. Because we won best paper, we were able to give two talks, the talk linked here is plenary talk where I argue that the evaluation methodology used widely in the community today is insufficient, and can be improved.