Beyond Prompt Injection: RL Attacks on AI Agent Decision-Making

What happens when you attack an AI agent’s learning process instead of its prompts? I built two custom Gymnasium environments (access control decisions, tool selection), trained 40 RL agents (Q-Learning, DQN, Double DQN, PPO across 5 seeds each), then systematically attacked them with 4 attack classes: reward poisoning, observation perturbation, policy extraction, and behavioral backdoors. 150 attack experiments total. The headline finding: observation perturbation degrades agent performance 20-50x more effectively than reward poisoning. And prompt-injection defenses from my earlier agent red-teaming work are 0% effective against RL-specific attacks — they target completely different surfaces. ...

March 16, 2026 · 3 min · Rex Coleman

Antivirus for AI Models: Behavioral Fingerprinting Detects What Static Analysis Misses

How do you know a model downloaded from Hugging Face hasn’t been backdoored? Static analysis tools like ModelScan check for serialization exploits and known payload patterns. They catch the obvious attacks. But a model poisoned through training data – one that behaves normally on 99.9% of inputs and activates a backdoor only on a specific trigger – passes every static check. The weights look fine. The architecture is standard. The malicious behavior is invisible until the trigger fires. ...

March 16, 2026 · 5 min · Rex Coleman

I Red-Teamed AI Agents: Here's How They Break (and How to Fix Them)

I sent 19 attack scenarios at a default-configured LangChain ReAct agent powered by Claude Sonnet. 13 succeeded. I then validated prompt injection on CrewAI — same rate (80%). The most dangerous attack class — reasoning chain hijacking — achieved a 100% success rate against these default-configured agents across 3 seeds and partially evades every defense I built. These results are specific to Claude backend with default agent configurations; production-hardened agents would likely show different success rates. Here’s what I found, what I built to find it, and what it means for anyone shipping autonomous agents. ...

March 16, 2026 · 5 min · Rex Coleman

One Principle, Six Domains: Adversarial Control Analysis for AI Security

I started with one question: if a network attacker can only control some features of network traffic, shouldn’t our IDS defenses focus on the features they can’t control? That question became a methodology. I called it adversarial control analysis (ACA) — classify every input by who controls it, then build defenses around the uncontrollable parts. It worked on intrusion detection. So I tried it on vulnerability prediction. Same result. Then AI agents. Then cryptography. Then financial fraud. Then software supply chains. ...

March 16, 2026 · 4 min · Rex Coleman

Adversarial ML on Network Intrusion Detection: What Adversarial Control Analysis Reveals

After 15 years at Mandiant watching network intrusion detection systems fail against real adversaries, I built one — then tried to break it. The finding that surprised me: the model architecture barely matters for robustness. What matters is which features the attacker can manipulate. The Setup I trained Random Forest, XGBoost, and Logistic Regression classifiers on the CICIDS2017 dataset (2.83M network flow records, 78 features, 15 traffic classes). Standard ML-on-IDS — nothing novel yet. ...

March 14, 2026 · 4 min · Rex Coleman

How I Govern AI-Assisted ML Projects

After four ML projects at Georgia Tech, I’d run 14 manual audit cycles with 30+ findings each. The governance wasn’t the problem — the manual enforcement was. So I built govML. The Problem Every ML project needs governance: reproducible experiments, documented decisions, data integrity checks, fair comparisons. But enforcing governance manually is a workflow killer. My unsupervised learning project had 7 audit cycles with 49+ findings. The RL project had 14 cycles with 30+ findings. I was spending more time auditing than experimenting. ...

March 14, 2026 · 3 min · Rex Coleman

Why CVSS Gets It Wrong: ML-Powered Vulnerability Prioritization

After 15 years of incident response at Mandiant, I watched security teams burn countless hours patching CVSS 9.8 vulnerabilities that never got exploited — while CVSS 7.5s got weaponized and led to breaches. CVSS measures severity. Attackers measure opportunity. I trained an ML model on 338,000 real CVEs to find out what actually predicts which vulnerabilities get exploited in the wild — and the answer is not what CVSS thinks it is. ...

March 14, 2026 · 5 min · Rex Coleman

© 2026 Rex Coleman. Blog content licensed under CC BY 4.0. Code under MIT.