Research

Antivirus for AI Models: Behavioral Fingerprinting Detects What Static Analysis Misses

A model poisoned through training data — one that behaves normally on 99.9% of inputs and activates a backdoor only on a specific trigger — passes every static analysis check. I built a behavioral fingerprinting system that detects these models using unsupervised anomaly detection: zero labeled backdoor examples, no model retraining, AUROC 0.62 on deliberately subtle synthetic backdoors. Static tools like ModelScan catch serialization exploits. Behavioral fingerprinting catches what static misses — and the defender controls the probe inputs, inverting the usual attacker advantage. This is a model supply chain problem analogous to the agent skill supply chain — in both cases, third-party artifacts execute inside your system and static analysis misses behavioral threats. ...

I Red-Teamed AI Agents: Here's How They Break (and How to Fix Them)

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I sent 19 attack scenarios at a default-configured LangChain ReAct agent powered by Claude Sonnet. 13 succeeded. I then validated prompt injection on CrewAI — same rate (80%). The most dangerous attack class — reasoning chain hijacking — achieved a 100% success rate against these default-configured agents across 3 seeds and partially evades every defense I built. These results are specific to Claude backend with default agent configurations; production-hardened agents would likely show different success rates. Here’s what I found, what I built to find it, and what it means for anyone shipping autonomous agents. ...

One Principle, Six Domains: Adversarial Control Analysis for AI Security

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I started with one question: if a network attacker can only control some features of network traffic, shouldn’t our IDS defenses focus on the features they can’t control? That question became a methodology. I called it adversarial control analysis (ACA) — classify every input by who controls it, then build defenses around the uncontrollable parts. It worked on intrusion detection. So I tried it on vulnerability prediction. Same result. Then AI agents. Then cryptography. Then financial fraud. Then software supply chains. ...

Adversarial ML on Network Intrusion Detection: What Adversarial Control Analysis Reveals

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. After studying how adversaries evade detection systems, I built one — then tried to break it. The finding that surprised me: the model architecture barely matters for robustness. What matters is which features the attacker can manipulate. ...

Why CVSS Gets It Wrong: ML-Powered Vulnerability Prioritization

I trained an ML model on 338,000 real CVEs to find out what actually predicts exploitation in the wild. The answer: vendor deployment ubiquity and vulnerability age matter more than CVSS score. CVSS measures severity. Attackers measure opportunity. Teams patching CVSS 9.8 vulnerabilities that never get exploited — while CVSS 7.5s get weaponized — are following the wrong signal. The Data Three public data sources, joined by CVE ID: Source Records Purpose NVD (NIST) 337,953 CVEs Features: CVSS scores, CWE types, descriptions, vendor/product, references ExploitDB 24,936 CVEs with known exploits Ground truth label: “was this CVE actually exploited?” EPSS (First.org) 320,502 scores Baseline comparison: an existing ML-based prediction Temporal split: Train on pre-2024 CVEs (234,601), test on 2024+ (103,352). This prevents data leakage from future information — in production, you always predict on CVEs you haven’t seen yet. ...