Traces for Automatic Exploitation of Adversarial Example Defenses
[Paper on arXiv]
[Code on GitHub]
Attack Model:
All Attacks
GPT-4o
o1
Sonnet 3.5 + o3
Sonnet 3.5
Sonnet 3.7 (thinking)
Sonnet 3.7
Defense Name:
Attack Status:
All
Successfully Attacked
Not Successfully Attacked
Apply Filters
Reset
Defense
No Attack
GPT-4o
o1
Sonnet 3.5 + o3
Sonnet 3.5
Sonnet 3.7 (thinking)
Sonnet 3.7
Attacked?
Defense Log Details
×
Defense
Model
Accuracy
Clean Accuracy
Log Content
Loading log file content...