safety-alignment

Jun 30, 2026 7 min read

Fine-Tuning as Jailbreak: How Benign Data Strips LLM Safety

Three papers published in 2026 confirm what practitioners suspected: LLM safety alignment is structurally shallow, and fine-tuning APIs are the widest open bypass.