You set up OpenClaw. It’s running. Your agent is browsing the web, installing skills, managing your files. Maybe it’s coding for you while you sleep. Maybe it’s monitoring trending topics and drafting content every two hours.
Here’s the thing: if you haven’t thought about security, your agent is a wide-open door. It has file system access, API credentials, and the ability to execute code. A compromised agent isn’t a compromised browser tab — it’s a compromised employee with the keys to everything.
I’ve been researching agent security as part of a systematic red-team study of AI agent frameworks. I also recently analyzed the security surface of OpenClaw specifically, drawing on disclosed vulnerabilities and operational patterns from power users running multi-agent setups 24/7. Here are the five biggest risks I found, and a 30-minute checklist to fix them.
The 5 Biggest Risks
1. Third-party skills execute on every heartbeat
This is the one that should scare you most. When you install a third-party skill, it doesn’t run once — it runs on every heartbeat cycle. That means persistent, continuous access to your agent’s context, memory, and capabilities.
Think about what that means. A malicious skill isn’t like visiting a bad website (bounded exposure, one page load). It’s like giving a stranger a permanent seat at your desk, watching everything your agent does, with the ability to modify behavior or exfiltrate data on every cycle.
Alex Finn, one of the most experienced OpenClaw builders I’ve studied, calls this “the biggest attack vector out of all the attack vectors.” His rule: never install third-party skills. If you need a capability, show your agent how a skill works and tell it to build its own version. You control the code. You can read it. You know what it does.
2. Websites can hijack your agent through the local gateway
A disclosed OpenClaw vulnerability demonstrated that any website can silently hijack a developer’s agent via malicious JavaScript targeting local gateway endpoints. The attack works because your agent’s local gateway listens on a network port, and a crafted web page can connect to it.
Unlike traditional web attacks that are sandboxed to a browser tab, a hijacked agent has file system access and API credentials. The blast radius is dramatically larger.
This was patched within 24 hours of disclosure, which is good. But the underlying class of vulnerability — agent endpoints accessible from web content — is structural. If your agent browses the web and exposes a local gateway, this attack class will recur in new forms.
3. VPS deployments expose your credentials
If you’re running OpenClaw on a VPS instead of local hardware, you’re starting from a worse security posture. VPS instances are Internet-facing by default. Port scanning is constant. And the evidence is concrete: exposed passwords and API keys from VPS-hosted OpenClaw instances have been found in publicly accessible lists.
Local hardware sits behind your router’s NAT and firewall. No public-facing ports. No shared tenancy. No credential exposure from misconfigured cloud infrastructure. For most solo builders, local is more secure in every measurable dimension.
4. Self-modifying memory can be corrupted
OpenClaw agents modify their own memory files — that’s a feature, not a bug. The self-improvement loop (agent forgets something, you ask why, it fixes its memory system) is one of the most powerful operational patterns.
But self-modifying memory is also an attack surface. If an attacker can influence what gets written to memory — through prompt injection, a compromised skill, or a poisoned web page — they can alter your agent’s behavior persistently. Worse, timing attacks around context compaction events (when the agent consolidates its context window) can exploit the brief period where recent information hasn’t been persisted yet.
Memory corruption is subtle. Unlike a crash or obvious error, a corrupted memory file silently changes how your agent reasons about everything going forward.
5. Prompt injection from browsed websites
If your agent browses the web, every page it visits is an untrusted input. Websites can embed instructions in hidden text, metadata, or even image alt text that your agent will process as part of its reasoning. My red-team research found prompt injection achieving 80% success rates against default-configured agents, and reasoning chain hijacking — where the injected prompt redirects the agent’s entire thought process — hit 100%.
Web browsing is useful. But without defenses, every website your agent visits is a potential attacker.
The 30-Minute Hardening Checklist
Set a timer. Work through these in order. Each one meaningfully reduces your attack surface.
Minutes 0-5: Audit your installed skills
# Check what skills are installed
ls ~/.openclaw/skills/
For each skill you didn’t write yourself:
- Can you read the source code? If not, remove it.
- Do you trust the author personally? If not, remove it.
- Can you rebuild the capability yourself? Ask your agent: “Look at how [skill name] works and build me a local version.”
Target state: Zero third-party skills, or only skills from people you personally trust.
Minutes 5-10: Lock down your network
If running locally:
- Verify your agent ports are not forwarded through your router. Check your router’s port forwarding settings — there should be no rules pointing to your agent’s machine.
- If you use a reverse proxy or tunnel (ngrok, Cloudflare Tunnel) to expose your agent, disable it unless you have a specific, understood reason to keep it.
If running on a VPS:
- Firewall everything. Only allow SSH (key-based, not password) and any ports you explicitly need.
- Run
ss -tlnpto see what’s listening. If your agent gateway is bound to0.0.0.0, rebind it to127.0.0.1. - Rotate every credential (API keys, tokens) that has been on the VPS. Assume they’ve been scanned.
# See what's listening on your machine
ss -tlnp
# If your gateway is on 0.0.0.0:3000, that's exposed to the internet
# Rebind to localhost only
Minutes 10-15: Set up the oversight loop
Don’t let any agent run unsupervised without a quality check. The most effective pattern I’ve seen is the “Ralph loop” — a cheap, always-on local model does the work, and a smarter cloud model audits it periodically.
Practically:
- Configure a review trigger on a timer (every 10-15 minutes)
- The reviewer checks: What did the worker agent do? Did it access anything unexpected? Did its behavior change?
- Log every action to a file your reviewer can inspect
This catches compromised behavior early. If a skill or prompt injection changes your agent’s behavior, the oversight model notices the deviation.
Minutes 15-20: Harden your memory files
# Back up current memory state
cp -r ~/.openclaw/memory/ ~/.openclaw/memory_backup_$(date +%Y%m%d)/
# Set up a simple diff check
diff -r ~/.openclaw/memory/ ~/.openclaw/memory_backup_latest/
More importantly:
- Store critical configuration and identity information in files your agent cannot modify. Read-only reference files.
- Keep your agent’s modifiable memory separate from its core instructions.
- Version your memory directory with git. Every change gets a commit. You can diff, revert, and audit.
cd ~/.openclaw/memory/
git init
git add -A && git commit -m "baseline memory state"
# Add a cron job to auto-commit changes every hour
# crontab -e
# 0 * * * * cd ~/.openclaw/memory && git add -A && git commit -m "hourly snapshot" 2>/dev/null
Minutes 20-25: Add web browsing guardrails
If your agent needs to browse the web:
- Allowlist domains where possible. If your agent only needs to check five websites, restrict it to those five.
- Disable JavaScript execution in your agent’s browser. Most prompt injection via web relies on rendered content that JS creates. Headless browsing without JS eliminates a large class of attacks.
- Separate browsing from acting. Have one agent browse and summarize. Have a different agent (without web access) make decisions based on those summaries. The browser agent is in a blast radius container — if it gets compromised, it can’t touch your files or credentials.
Minutes 25-30: Verify and document
Run through this verification:
- No third-party skills (or only personally trusted ones)
- Agent ports not exposed to the Internet
- Oversight loop running (reviewer agent checking worker agent)
- Memory directory under version control
- Core instructions in read-only files
- Web browsing restricted (allowlist or JS disabled)
- All VPS credentials rotated (if applicable)
- Backup of memory state created
Write down what you changed. When (not if) a new vulnerability drops, you’ll want to know your baseline.
What Isn’t Solved Yet
I want to be honest about the limits. Some things don’t have clean fixes today:
- Compaction-time attacks are acknowledged but unsolved. When your agent compacts its context window, there’s a brief period of vulnerability. The best mitigation is persisting critical information to disk before compaction triggers.
- Prompt injection is not fully solvable at the model layer right now. Defenses reduce success rates but don’t eliminate them. Defense in depth (oversight loops, separation of concerns, input filtering) is the practical answer.
- Agent framework security is immature. There are no CVE standards for agent vulnerabilities, no security certification programs, no standardized audit tools. We’re early.
What We’re Building
At Singularity Cybersecurity, I’m working on tools that automate what this checklist does manually — scanning skills, hardening configurations, and red-teaming agent deployments. My red-team framework is open source and includes 7 attack classes, 5 of which aren’t in any existing taxonomy.
If you’re building on OpenClaw or similar agent frameworks and want to think more rigorously about security, the agent red-team post goes deep on the methodology. The adversarial control analysis covers how to classify which inputs an attacker controls vs. which a defender can observe — useful for designing your own agent threat models.
Agent security is a new domain. The people building agents right now are writing the security playbook. I’d rather we write it proactively than learn it from breaches.
This post draws on research from my agent red-team framework (FP-02), analysis of OpenClaw security surface from Moonshots EP #237 (Alex Finn interview), and disclosed vulnerabilities in agent frameworks. All claims are grounded in documented findings, not speculation.