autonomous attacks

Jul 4, 2026 5 min read

Autonomous AI Jailbreaking: Reasoning Models Hit 97% Attack Success

A peer-reviewed Nature Communications study shows reasoning models can autonomously jailbreak other LLMs at a 97.14% success rate with no human intervention — and that resistance varies by 31x across major models, with Claude 4 Sonnet holding at 2.86% while DeepSeek-V3 reaches 90%.