Research-Report

Adversarial Control Analysis: A Unified Framework for Designing ML Systems That Survive Adversaries Across Six Security Domains

Abstract Machine learning systems deployed in adversarial environments face a fundamental challenge: attackers manipulate inputs to evade detection, yet most adversarial ML research treats all features as equally perturbable. We introduce Adversarial Control Analysis (ACA), a framework that classifies every input to an ML system by its controller — attacker-controlled, defender-observable, system-determined, or nature-governed — and uses this classification to predict adversarial robustness and guide architectural defense. We apply ACA across six security domains: network intrusion detection (57/78 features attacker-controllable; constraining perturbations to controllable features reduces attack success by 35%), vulnerability prioritization (EPSS, a system-controlled signal, dominates prediction at 2x the SHAP importance of any other feature), AI agent security (attack success correlates inversely with defender observability, from 25% on observable inputs to 100% on internal state), post-quantum cryptography migration (70% of crypto findings are library-controlled, not developer-actionable), financial fraud detection (system-controlled features achieve 81% of full model performance), and AI supply chain security (75% of findings are developer-controlled). In every domain, ACA correctly predicts which features and defenses will survive adversarial pressure. The framework provides a three-step methodology — Enumerate, Classify, Architect — that security practitioners can apply before writing a single line of model code. ACA formalizes the principle that security architecture, not model optimization, determines adversarial robustness. ...

Beyond Prompt Injection: Observation Perturbation Dominates Reward Poisoning by 20-50x in RL Agent Attacks

Abstract Autonomous agents powered by reinforcement learning (RL) are deploying into production security workflows including access control enforcement, tool selection, and incident response. Current agent security research focuses overwhelmingly on prompt injection attacks targeting the large language model (LLM) reasoning layer, leaving RL-specific attack surfaces largely uncharacterized. We present a systematic evaluation of four attack classes — reward poisoning, observation perturbation, policy extraction, and behavioral backdoors — against 40 RL agents (Q-Learning, DQN, Double DQN, PPO across 5 seeds) trained on two security-relevant custom Gymnasium environments: AccessControl (gridworld-based access policy enforcement) and ToolSelection (resource allocation under constraints). Across 150 attack experiments with 3-seed validation, we find that observation perturbation degrades agent performance 20–50x more effectively than reward poisoning, even at minimal perturbation budgets (epsilon = 0.01). Policy extraction achieves 72% agreement with victim policies using only 500 black-box queries, enabling offline attack rehearsal. We map all attacks to the OWASP Agentic Security Initiative taxonomy, covering 7 of 10 categories, and identify 5 RL-specific attack classes absent from current frameworks. We propose a controllability analysis framework that unifies vulnerability prediction across LLM and RL agent layers. All code and data are open-source. ...

EPSS Dominates All Other Features in ML-Based Vulnerability Prioritization: An Ablation Study with SHAP Interpretability

Abstract The Common Vulnerability Scoring System (CVSS) remains the industry standard for vulnerability triage, yet it was designed to measure severity, not exploitability. We evaluate seven machine learning algorithms on 337,953 CVEs from the National Vulnerability Database, using 24,936 confirmed exploits from ExploitDB as ground truth labels. All seven algorithms outperform CVSS-based triage (AUC 0.662), with Logistic Regression achieving AUC 0.903 (+24.1pp) and tuned XGBoost matching the Exploit Prediction Scoring System (EPSS) at AUC 0.912. A five-seed ablation study with SHAP interpretability reveals that EPSS percentile alone contributes +15.5pp AUC — nearly all useful signal in the model. Four feature groups (temporal, reference, vendor metadata, and description statistics) actively hurt performance when included. Adversarial evaluation confirms 0% evasion across three text-based attack types, because the model’s decision-critical features are defender-observable and outside adversary control. These findings challenge the assumption that more features improve vulnerability prediction and provide a reproducible, interpretable framework for prioritization that organizations can deploy using only public data. All seven pre-registered hypotheses were supported. Code, data pipeline, and governance artifacts are released as open source. ...

State of AI Agent Security Q1 2026: 820 Malicious Skills, $500M in VC, and Zero Dedicated Tooling

Abstract The AI agent economy is expanding rapidly. Over 100 million developers and builders are now deploying autonomous agents that browse the web, execute code, manage files, and interact with external APIs. Security has not kept pace. This report presents a systematic signal analysis of the AI agent security landscape as of Q1 2026, synthesizing threat intelligence, market data, and community pain signals from across the ecosystem. The findings are stark: 820+ malicious skills have been identified on ClawHub (approximately 20% of the registry), 30 MCP-related CVEs were disclosed in a 60-day window, and VirusTotal remains blind to 6,487 agent-specific malicious tools. On the market side, over $500 million in venture capital has been deployed into agent security startups in Q1 2026 alone, yet only 29% of enterprises report having agent security policies in place. The gap between threat velocity and defense tooling represents both the central risk and the defining market opportunity in AI security today. This report documents the evidence, maps the competitive landscape, and identifies the specific defense categories where no dominant solution exists. ...

Systematic Red-Teaming of AI Agents: A 7-Class Attack Taxonomy with Controllability-Based Defense Architecture

Abstract Autonomous AI agents that reason, use tools, and take actions are being deployed into production systems at scale, yet no systematic methodology exists for evaluating their security posture beyond the model layer. We present an open-source red-team framework that executes 19 attack scenarios across 7 attack classes against LangChain ReAct agents with a Claude Sonnet backend. Five of the seven attack classes target agent-specific surfaces not covered by OWASP LLM Top 10 or MITRE ATLAS. Our most significant finding is reasoning chain hijacking, an attack that exploits an agent’s core capability — following structured multi-step plans — as the attack vector, achieving a 100% success rate against default-configured agents across 3 seeds (temperature=0). We introduce adversarial control analysis (ACA) to the agent security domain and demonstrate that attack success correlates inversely with defender observability: the reasoning chain, being internal to the agent’s processing loop, is both the least observable input and the most vulnerable surface. A layered defense architecture (input sanitization, LLM-as-judge, tool permission boundaries) achieves 67% average attack reduction, but reasoning chain hijacking remains the highest-priority unsolved problem, decreasing only from 100% to 33% success. The framework, attack taxonomy, and all scenarios are released as open source to enable reproducible agent security evaluation. ...