1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
from evaluator import *

DESCRIPTION = "Test if the model can extract structured data from (somewhat) unstructured text."

TAGS = ['data']

question = '''
From the following data extract the best performing defense each year, in the format {year: robust accuracy}

So for example the answer for {"2024": 69.71, "2023": ..., ...}, now fill it in for every other year. Return the answer as a JSON dict.


Rank	Method	Standard
accuracy	AutoAttack
robust
accuracy	Best known
robust
accuracy	AA eval.
potentially
unreliable	Extra
data	Architecture	Venue
1	Robust Principles: Architectural Design Principles for Adversarially Robust CNNs
It uses additional 50M synthetic images in training.	93.27%	71.07%	71.07%	
×
×	RaWideResNet-70-16	BMVC 2023
2	Better Diffusion Models Further Improve Adversarial Training
It uses additional 50M synthetic images in training.	93.25%	70.69%	70.69%	
×
×	WideResNet-70-16	ICML 2023
3	MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers
It uses an ensemble of networks. The robust base classifier uses 50M synthetic images. 69.71% robust accuracy is due to the original evaluation (Adaptive AutoAttack)	95.19%	70.08%	69.71%	
×
☑	ResNet-152 + WideResNet-70-16	arXiv, Feb 2024
4	Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing
It uses an ensemble of networks. The robust base classifier uses 50M synthetic images.	95.23%	68.06%	68.06%	
×
☑	ResNet-152 + WideResNet-70-16 + mixing network	SIMODS 2024
5	Decoupled Kullback-Leibler Divergence Loss
It uses additional 20M synthetic images in training.	92.16%	67.73%	67.73%	
×
×	WideResNet-28-10	arXiv, May 2023
6	Better Diffusion Models Further Improve Adversarial Training
It uses additional 20M synthetic images in training.	92.44%	67.31%	67.31%	
×
×	WideResNet-28-10	ICML 2023
7	Fixing Data Augmentation to Improve Adversarial Robustness
66.56% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	92.23%	66.58%	66.56%	
×
☑	WideResNet-70-16	arXiv, Mar 2021
8	Improving Robustness using Generated Data
It uses additional 100M synthetic images in training. 66.10% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	88.74%	66.11%	66.10%	
×
×	WideResNet-70-16	NeurIPS 2021
9	Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
65.87% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	91.10%	65.88%	65.87%	
×
☑	WideResNet-70-16	arXiv, Oct 2020
10	Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective	91.58%	65.79%	65.79%	
×
☑	WideResNet-A4	arXiv, Dec. 2022
11	Fixing Data Augmentation to Improve Adversarial Robustness
It uses additional 1M synthetic images in training. 64.58% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	88.50%	64.64%	64.58%	
×
×	WideResNet-106-16	arXiv, Mar 2021
12	Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks
Based on the model Rebuffi2021Fixing_70_16_cutmix_extra. 64.20% robust accuracy is due to AutoAttack + transfer APGD from Rebuffi2021Fixing_70_16_cutmix_extra	93.73%	71.28%	64.20%	

☑	WideResNet-70-16, Neural ODE block	NeurIPS 2021
13	Fixing Data Augmentation to Improve Adversarial Robustness
It uses additional 1M synthetic images in training. 64.20% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	88.54%	64.25%	64.20%	
×
×	WideResNet-70-16	arXiv, Mar 2021
14	Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness
It uses additional 10M synthetic images in training.	93.69%	63.89%	63.89%	
×
×	WideResNet-28-10	ICLR 2023
15	Improving Robustness using Generated Data
It uses additional 100M synthetic images in training. 63.38% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	87.50%	63.44%	63.38%	
×
×	WideResNet-28-10	NeurIPS 2021
16	Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
It uses additional 1M synthetic images in training.	89.01%	63.35%	63.35%	
×
×	WideResNet-70-16	ICML 2022
17	Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off	91.47%	62.83%	62.83%	
×
☑	WideResNet-34-10	OpenReview, Jun 2021
18	Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
It uses additional 10M synthetic images in training.	87.30%	62.79%	62.79%	
×
×	ResNest152	ICLR 2022
19	Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
62.76% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	89.48%	62.80%	62.76%	
×
☑	WideResNet-28-10	arXiv, Oct 2020
20	Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
Uses exponential moving average (EMA)	91.23%	62.54%	62.54%	
×
☑	WideResNet-34-R	NeurIPS 2021
21	Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks	90.56%	61.56%	61.56%	
×
☑	WideResNet-34-R	NeurIPS 2021
22	Parameterizing Activation Functions for Adversarial Robustness
It uses additional ~6M synthetic images in training.	87.02%	61.55%	61.55%	
×
×	WideResNet-28-10-PSSiLU	arXiv, Oct 2021
23	Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
It uses additional 1M synthetic images in training.	88.61%	61.04%	61.04%	
×
×	WideResNet-28-10	ICML 2022
24	Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off
It uses additional 1M synthetic images in training.	88.16%	60.97%	60.97%	
×
×	WideResNet-28-10	OpenReview, Jun 2021
25	Fixing Data Augmentation to Improve Adversarial Robustness
It uses additional 1M synthetic images in training. 60.73% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	87.33%	60.75%	60.73%	
×
×	WideResNet-28-10	arXiv, Mar 2021
26	Do Wider Neural Networks Really Help Adversarial Robustness?
87.67%	60.65%	60.65%	Unknown	☑	WideResNet-34-15	arXiv, Oct 2020
27	Improving Neural Network Robustness via Persistency of Excitation	86.53%	60.41%	60.41%	
×
☑	WideResNet-34-15	ACC 2022
28	Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
It uses additional 10M synthetic images in training.	86.68%	60.27%	60.27%	
×
×	WideResNet-34-10	ICLR 2022
29	Adversarial Weight Perturbation Helps Robust Generalization	88.25%	60.04%	60.04%	
×
☑	WideResNet-28-10	NeurIPS 2020
30	Improving Neural Network Robustness via Persistency of Excitation	89.46%	59.66%	59.66%	
×
☑	WideResNet-28-10	ACC 2022
31	Geometry-aware Instance-reweighted Adversarial Training
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255.	89.36%	59.64%	59.64%	
×
☑	WideResNet-28-10	ICLR 2021
32	Unlabeled Data Improves Adversarial Robustness	89.69%	59.53%	59.53%	
×
☑	WideResNet-28-10	NeurIPS 2019
33	Improving Robustness using Generated Data
It uses additional 100M synthetic images in training. 58.50% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	87.35%	58.63%	58.50%	
×
×	PreActResNet-18	NeurIPS 2021
34	Data filtering for efficient adversarial training
86.10%	58.09%	58.09%	
×
×	WideResNet-34-20	Pattern Recognition 2024
35	Scaling Adversarial Training to Large Perturbation Bounds	85.32%	58.04%	58.04%	
×
×	WideResNet-34-10	ECCV 2022
36	Efficient and Effective Augmentation Strategy for Adversarial Training	88.71%	57.81%	57.81%	
×
×	WideResNet-34-10	NeurIPS 2022
37	LTD: Low Temperature Distillation for Robust Adversarial Training
86.03%	57.71%	57.71%	
×
×	WideResNet-34-20	arXiv, Nov 2021
38	Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off	89.02%	57.67%	57.67%	
×
☑	PreActResNet-18	OpenReview, Jun 2021
39	LAS-AT: Adversarial Training with Learnable Attack Strategy
85.66%	57.61%	57.61%	
×
×	WideResNet-70-16	arXiv, Mar 2022
40	A Light Recipe to Train Robust Vision Transformers	91.73%	57.58%	57.58%	
×
☑	XCiT-L12	arXiv, Sep 2022
41	Data filtering for efficient adversarial training
86.54%	57.30%	57.30%	
×
×	WideResNet-34-10	Pattern Recognition 2024
42	A Light Recipe to Train Robust Vision Transformers	91.30%	57.27%	57.27%	
×
☑	XCiT-M12	arXiv, Sep 2022
43	Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
57.14% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	85.29%	57.20%	57.14%	
×
×	WideResNet-70-16	arXiv, Oct 2020
44	HYDRA: Pruning Adversarially Robust Neural Networks
Compressed model	88.98%	57.14%	57.14%	
×
☑	WideResNet-28-10	NeurIPS 2020
45	Decoupled Kullback-Leibler Divergence Loss	85.31%	57.09%	57.09%	
×
×	WideResNet-34-10	arXiv, May 2023
46	Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off
It uses additional 1M synthetic images in training.	86.86%	57.09%	57.09%	
×
×	PreActResNet-18	OpenReview, Jun 2021
47	LTD: Low Temperature Distillation for Robust Adversarial Training
85.21%	56.94%	56.94%	
×
×	WideResNet-34-10	arXiv, Nov 2021
48	Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
56.82% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted)	85.64%	56.86%	56.82%	
×
×	WideResNet-34-20	arXiv, Oct 2020
49	Fixing Data Augmentation to Improve Adversarial Robustness
It uses additional 1M synthetic images in training.	83.53%	56.66%	56.66%	
×
×	PreActResNet-18	arXiv, Mar 2021
50	Improving Adversarial Robustness Requires Revisiting Misclassified Examples	87.50%	56.29%	56.29%	
×
☑	WideResNet-28-10	ICLR 2020
51	LAS-AT: Adversarial Training with Learnable Attack Strategy
84.98%	56.26%	56.26%	
×
×	WideResNet-34-10	arXiv, Mar 2022
52	Adversarial Weight Perturbation Helps Robust Generalization	85.36%	56.17%	56.17%	
×
×	WideResNet-34-10	NeurIPS 2020
53	A Light Recipe to Train Robust Vision Transformers	90.06%	56.14%	56.14%	
×
☑	XCiT-S12	arXiv, Sep 2022
54	Are Labels Required for Improving Adversarial Robustness?	86.46%	56.03%	56.03%	Unknown	☑	WideResNet-28-10	NeurIPS 2019
55	Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
It uses additional 10M synthetic images in training.	84.59%	55.54%	55.54%	
×
×	ResNet-18	ICLR 2022
56	Using Pre-Training Can Improve Model Robustness and Uncertainty	87.11%	54.92%	54.92%	
×
☑	WideResNet-28-10	ICML 2019
57	Bag of Tricks for Adversarial Training
86.43%	54.39%	54.39%	Unknown	×	WideResNet-34-20	ICLR 2021
58	Boosting Adversarial Training with Hypersphere Embedding	85.14%	53.74%	53.74%	
×
×	WideResNet-34-20	NeurIPS 2020
59	Learnable Boundary Guided Adversarial Training
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255	88.70%	53.57%	53.57%	
×
×	WideResNet-34-20	ICCV 2021
60	Attacks Which Do Not Kill Training Make Adversarial Learning Stronger	84.52%	53.51%	53.51%	
×
×	WideResNet-34-10	ICML 2020
61	Overfitting in adversarially robust deep learning	85.34%	53.42%	53.42%	
×
×	WideResNet-34-20	ICML 2020
62	Self-Adaptive Training: beyond Empirical Risk Minimization
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255.	83.48%	53.34%	53.34%	Unknown	×	WideResNet-34-10	NeurIPS 2020
63	Theoretically Principled Trade-off between Robustness and Accuracy
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255.	84.92%	53.08%	53.08%	Unknown	×	WideResNet-34-10	ICML 2019
64	Learnable Boundary Guided Adversarial Training
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255	88.22%	52.86%	52.86%	
×
×	WideResNet-34-10	ICCV 2021
65	Adversarial Robustness through Local Linearization	86.28%	52.84%	52.84%	Unknown	×	WideResNet-40-8	NeurIPS 2019
66	Efficient and Effective Augmentation Strategy for Adversarial Training	85.71%	52.48%	52.48%	
×
×	ResNet-18	NeurIPS 2022
67	Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
Uses ensembles of 3 models.	86.04%	51.56%	51.56%	Unknown	×	ResNet-50	CVPR 2020
68	Efficient Robust Training via Backward Smoothing
85.32%	51.12%	51.12%	Unknown	×	WideResNet-34-10	arXiv, Oct 2020
69	Scaling Adversarial Training to Large Perturbation Bounds	80.24%	51.06%	51.06%	
×
×	ResNet-18	ECCV 2022
70	Improving Adversarial Robustness Through Progressive Hardening
86.84%	50.72%	50.72%	Unknown	×	WideResNet-34-10	arXiv, Mar 2020
71	Robustness library	87.03%	49.25%	49.25%	Unknown	×	ResNet-50	GitHub,
Oct 2019
72	Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models	87.80%	49.12%	49.12%	Unknown	×	WideResNet-34-10	IJCAI 2019
73	Metric Learning for Adversarial Robustness	86.21%	47.41%	47.41%	Unknown	×	WideResNet-34-10	NeurIPS 2019
74	You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle
Focuses on fast adversarial training.	87.20%	44.83%	44.83%	Unknown	×	WideResNet-34-10	NeurIPS 2019
75	Towards Deep Learning Models Resistant to Adversarial Attacks	87.14%	44.04%	44.04%	Unknown	×	WideResNet-34-10	ICLR 2018
76	Understanding and Improving Fast Adversarial Training
Focuses on fast adversarial training.	79.84%	43.93%	43.93%	Unknown	×	PreActResNet-18	NeurIPS 2020
77	Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness	80.89%	43.48%	43.48%	Unknown	×	ResNet-32	ICLR 2020
78	Fast is better than free: Revisiting adversarial training
Focuses on fast adversarial training.	83.34%	43.21%	43.21%	Unknown	×	PreActResNet-18	ICLR 2020
79	Adversarial Training for Free!	86.11%	41.47%	41.47%	Unknown	×	WideResNet-34-10	NeurIPS 2019
80	MMA Training: Direct Input Space Margin Maximization through Adversarial Training	84.36%	41.44%	41.44%	Unknown	×	WideResNet-28-4	ICLR 2020
81	A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs
Compressed model	87.32%	40.41%	40.41%	
×
×	ResNet-18	ASP-DAC 2021
82	Controlling Neural Level Sets
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255.	81.30%	40.22%	40.22%	Unknown	×	ResNet-18	NeurIPS 2019
83	Robustness via Curvature Regularization, and Vice Versa	83.11%	38.50%	38.50%	Unknown	×	ResNet-18	CVPR 2019
84	Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training	89.98%	36.64%	36.64%	Unknown	×	WideResNet-28-10	NeurIPS 2019
85	Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness	90.25%	36.45%	36.45%	Unknown	×	WideResNet-28-10	OpenReview, Sep 2019
86	Adversarial Defense via Learning to Generate Diverse Attacks	78.91%	34.95%	34.95%	Unknown	×	ResNet-20	ICCV 2019
87	Sensible adversarial learning	91.51%	34.22%	34.22%	Unknown	×	WideResNet-34-10	OpenReview, Sep 2019
88	Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Verifiably robust model with 32.24% provable robust accuracy	44.73%	32.64%	32.64%	Unknown	×	5-layer-CNN	ICLR 2020
89	Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks	92.80%	29.35%	29.35%	Unknown	×	WideResNet-28-10	ICCV 2019
90	Enhancing Adversarial Defense by k-Winners-Take-All
Uses 


 = 0.031 ≈ 7.9/255 instead of 8/255.
7.40% robust accuracy is due to 1 restart of APGD-CE and 30 restarts of Square Attack
Note: this adaptive evaluation (Section 5) reports 0.16% robust accuracy on a different model (adversarially trained ResNet-18).	79.28%	18.50%	7.40%	

×	DenseNet-121	ICLR 2020
91	Manifold Regularization for Adversarial Robustness	90.84%	1.35%	1.35%	Unknown	×	ResNet-18	arXiv, Mar 2020
92	Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks	89.16%	0.28%	0.28%	Unknown	×	ResNet-110	ICCV 2019
93	Jacobian Adversarially Regularized Networks for Robustness	93.79%	0.26%	0.26%	Unknown	×	WideResNet-34-10	ICLR 2020
94	ClusTR: Clustering Training for Robustness	91.03%	0.00%	0.00%	Unknown	×	WideResNet-28-10	arXiv, Jun 2020
95	Standardly trained model	94.78%	0.0%	0.0%	Unknown	×	WideResNet-28-10	N/A
'''


TestDataYearExtract = question >> LLMRun() >> ExtractJSON() >> JSONSubsetEvaluator({
    "2024": 69.71,
    "2023": 71.07,
    "2022": 65.79,
    "2021": 66.56,
    "2020": 65.87,
    "2019": 59.53,
    "2018": 44.04
})




if __name__ == "__main__":
    print(run_test(TestDataYearExtract))