Privilege Escalation Cascades at 98% While Domain-Aligned Attacks Are Invisible
Domain-aligned prompt injections cascade through multi-agent systems at a 0% detection rate. Privilege escalation payloads hit 97.6%. That’s a 98 percentage-point spread across payload types in the same agent architecture — the single biggest variable determining whether your multi-agent system catches an attack or never sees it. I ran six experiments on real Claude Haiku agents to find out why. Three resistance patterns explain the gap — and each has a quantified bypass condition. ...