Agent-Security

5 AI Security Gaps That Jensen Huang, Eric Schmidt, and the OpenClaw Creator All Flagged This Month

I spent this week extracting AI security signals from five frontier podcasts — Jensen Huang on Lex Fridman, Eric Schmidt on Moonshots, Peter Steinberger (OpenClaw creator) on Lex Fridman, and two Moonshots panel episodes covering NVIDIA, Anthropic, and Tesla. 68 claims, 30 concepts, 26 signals logged to a structured knowledge base. The finding that surprised me: three independent sources — a $4 trillion CEO, a former Google CEO, and the creator of the fastest-growing open-source project in history — all flagged the same security gaps without coordinating. Here are the five signals that converged. ...

Our Simulation Was Wrong by 37 Percentage Points — What Real LLM Agents Taught Us About Multi-Agent Cascade

I built a multi-agent security simulation, ran 6 experiments, then validated against real Claude Haiku agents. The simulation predicted 97% poison rate. Real agents: 60%. And the biggest surprise: topology matters — something the simulation said was irrelevant. What I Built A simulation-based testbed that models multi-agent systems with configurable trust architectures, network topologies, attacker types, and agent compositions. One agent gets compromised. We measure how poisoned outputs cascade through the system. ...

Privilege Escalation Cascades at 98% While Domain-Aligned Attacks Are Invisible

Domain-aligned prompt injections cascade through multi-agent systems at a 0% detection rate. Privilege escalation payloads hit 97.6%. That’s a 98 percentage-point spread across payload types in the same agent architecture — the single biggest variable determining whether your multi-agent system catches an attack or never sees it. I ran six experiments on real Claude Haiku agents to find out why. Three resistance patterns explain the gap — and each has a quantified bypass condition. ...

Why Third-Party Skills Are the Biggest Agent Attack Vector

Last week I published a 30-minute hardening guide for OpenClaw. The #1 risk on that list was third-party skills. Since then, the numbers have gotten worse. 820+ malicious skills are now on ClawHub — roughly 20% of the entire registry. That’s not a rounding error. That’s one in five skills being actively hostile to the agent that installs them. But the number isn’t what makes this the biggest attack vector. The architecture is. ...

$500M+ VC chasing agent security, but the biggest gap has no product

In Q1 2026, over $500M in venture capital was deployed into agent security startups — Armadin ($190M, Kevin Mandia’s new company), Kai ($125M), 7AI ($166M), Onyx ($40M). Enterprise budgets are increasing 20-40% for agent security add-ons. The market is funded and growing fast. But the biggest pain point has no dominant product. Why this matters The #1 and #2 pain points in agent security — malicious marketplace skills and prompt injection enabling RCE — both score 45/45 on frequency x intensity rankings. But the solution landscape for runtime agent behavior monitoring is empty. 80% of IT professionals report agents performing unauthorized actions. NanoClaw provides container-level isolation but doesn’t monitor behavior inside the container. No widely-adopted tool watches what agents actually do in real-time: which files they access, which APIs they call, which network connections they make. ...

30 MCP CVEs in 60 days

The MCP (Model Context Protocol) ecosystem accumulated 30 CVEs in its first 60 days of widespread adoption. Of 1,808 MCP servers scanned, 66% had security findings. 492 had no authentication or encryption at all. Why this matters MCP is the protocol that lets AI agents connect to external tools and data sources. It is becoming the standard integration layer for the agent economy. When two-thirds of the servers implementing that standard ship with security gaps, it means the entire agent ecosystem is building on a foundation full of holes. This isn’t a theoretical risk — these are real CVEs with real exploit paths. ...

820 malicious skills on ClawHub: 1 in 5 is hostile

820+ malicious skills have been identified on ClawHub, the OpenClaw marketplace. That means roughly 1 in 5 skills listed in the registry is hostile — designed to exfiltrate data, inject commands, or establish persistence in your agent environment. Why this matters ClawHub is where most OpenClaw users discover and install third-party skills. It is the npm/PyPI of the agent economy, and it has the same supply chain poisoning problem those ecosystems faced — except worse. Agent skills don’t just run code at install time. They execute continuously during agent operation, with access to your terminal, filesystem, and API credentials. A malicious skill doesn’t need a clever exploit chain. It just needs you to install it. ...

AI Security Has a Shipping Problem

Thesis: The AI security industry produces frameworks and guidelines but almost no one ships working tools that practitioners can deploy today. The gap between “risk identified” and “risk mitigated” in AI security is wider than any other area of cybersecurity I’ve worked in. We have more frameworks per deployed tool than any domain in the history of information security. And the frameworks keep coming while the tools don’t. The Evidence 1. OWASP published the Agentic Top 10 in late 2025. No tools enforce it. ...

Prompt Injection Is Yesterday's Threat. RL Attacks Are Next.

Thesis: The security community is focused on prompt injection, but RL-specific attacks — reward poisoning, observation perturbation, policy extraction — are more dangerous and less understood. Prompt injection is real. I’ve tested it. In my agent red-teaming research, direct prompt injection achieved 80% success against default-configured LangChain ReAct agents. Reasoning chain hijacking hit 100%. These are serious vulnerabilities. But prompt injection is also becoming yesterday’s threat — it’s well-characterized, actively mitigated, and architecturally bounded. The attacks that should keep agent deployers awake are the ones that don’t touch the prompt at all. ...

Reasoning chain hijacking has 100% success rate on default LangChain

In red-team testing of AI agent frameworks, reasoning chain hijacking attacks achieved a 100% success rate against default LangChain configurations. Every single attempt to inject instructions into the agent’s chain-of-thought reasoning succeeded in altering the agent’s behavior. Why this matters Reasoning chain hijacking is different from basic prompt injection. Instead of injecting a single malicious instruction, the attacker injects a plausible reasoning chain that guides the agent through a series of “logical” steps toward the attacker’s goal. The agent follows the injected chain because it looks like its own reasoning. Default LangChain configurations have no defense against this — no chain validation, no reasoning integrity checks, no anomaly detection on thought patterns. ...