Agent-Security

The Agent Security Gap Nobody's Talking About: Skills Run Every Heartbeat

Thesis: Everyone’s worried about prompt injection, but the real agent attack surface is third-party skills — they execute persistently on every heartbeat cycle, not once per conversation. I keep having the same conversation. Someone asks about agent security. I say “third-party skills.” They say “you mean prompt injection?” No. I mean the code that runs inside your agent 144 times per day, with full access to your agent’s memory, context, and credentials, that you installed from a marketplace where one in five entries is actively malicious. ...

Third-party skills execute every heartbeat — not once

When you install a third-party OpenClaw skill, it doesn’t just run at install time. It executes on every agent heartbeat — every loop iteration where the agent checks its environment, processes inputs, and decides what to do next. A malicious skill gets continuous execution, not a one-shot opportunity. Why this matters Most developers think of skill installation like installing a library: it runs setup once, then sits there until called. That mental model is wrong for agent skills. Agent architectures run skills as part of their core loop. This means a malicious skill gets persistent, repeated access to the agent’s context, memory, filesystem, and network connections — not just a single execution window. ...

VirusTotal can't detect agent-specific malware

6,487 malicious agent tools are undetectable by VirusTotal and traditional malware scanners. These tools don’t trigger signature-based detection because they don’t look like traditional malware. They look like normal agent skills — because that’s what they are, with a few extra lines that exfiltrate data or establish persistence. Why this matters The security industry has spent 30 years building increasingly sophisticated malware detection. Signature databases, behavioral heuristics, sandbox detonation, ML classifiers — all tuned for executables, scripts, and documents that do obviously malicious things. Agent-specific malware doesn’t fit this model. A malicious OpenClaw skill is a valid Python file that performs a legitimate function AND quietly sends your API keys to an external server. There’s no shellcode, no packing, no obfuscation. VirusTotal has nothing to flag. ...

How to Secure Your OpenClaw in 30 Minutes

A default OpenClaw installation has file system access, API credentials, and code execution — with zero security controls enabled. One in five ClawHub skills is actively malicious. Exposed credentials from VPS-hosted agents are already showing up in public breach lists. A compromised agent isn’t a compromised browser tab — it’s a compromised employee with the keys to everything. For the full analysis of why third-party skills are the biggest agent attack vector and what makes this worse than prompt injection at the architecture level, see the companion research. ...

Beyond Prompt Injection: RL Attacks on AI Agent Decision-Making

Observation perturbation degrades RL agent performance 20-50x more effectively than reward poisoning. And prompt-injection defenses? 0% effective against RL-specific attacks — they target completely different surfaces. I built two custom Gymnasium environments (access control, tool selection), trained 40 agents across 4 algorithms and 5 seeds, then ran 150 attack experiments across 4 attack classes. The result: if you’re monitoring reward signals but not observation channels, you’re watching the wrong surface. ...

I Red-Teamed AI Agents: Here's How They Break (and How to Fix Them)

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I sent 19 attack scenarios at a default-configured LangChain ReAct agent powered by Claude Sonnet. 13 succeeded. I then validated prompt injection on CrewAI — same rate (80%). The most dangerous attack class — reasoning chain hijacking — achieved a 100% success rate against these default-configured agents across 3 seeds and partially evades every defense I built. These results are specific to Claude backend with default agent configurations; production-hardened agents would likely show different success rates. Here’s what I found, what I built to find it, and what it means for anyone shipping autonomous agents. ...