AI Security

A CFA Charterholder Built an ML Fraud Detector: Here's What the Models Miss

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I’m a CFA charterholder who builds ML systems. I trained XGBoost on 100K financial transactions to detect fraud — AUC 0.987. But the most interesting finding wasn’t the model performance. It was that CFA-informed rule-based scoring achieves 0.898 AUC on its own, and 8 of the top 20 predictive features come from domain expertise, not raw data. ...

AI Security Has a Shipping Problem

Thesis: The AI security industry produces frameworks and guidelines but almost no one ships working tools that practitioners can deploy today. The gap between “risk identified” and “risk mitigated” in AI security is wider than any other area of cybersecurity I’ve worked in. We have more frameworks per deployed tool than any domain in the history of information security. And the frameworks keep coming while the tools don’t. The Evidence 1. OWASP published the Agentic Top 10 in late 2025. No tools enforce it. ...

Apply Adversarial Control Analysis to Your ML System in 3 Steps

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. Problem Statement You have deployed an ML model and someone asks: “Is it robust to adversarial attack?” You do not have a principled way to answer. You could fuzz every input, but that is expensive and tells you nothing about which attacks are structurally impossible versus which are just untested. You need a method that maps the attack surface before you start testing. ...

Build Your Own ML Vuln Prioritizer

Problem Statement Your security team triages vulnerabilities by CVSS score. A 9.8 gets patched immediately; a 7.5 waits. But CVSS measures severity, not exploitability. In real-world data, CVSS achieves an AUC of just 0.662 at predicting which CVEs actually get exploited – barely better than a coin flip. You need a model that predicts exploitation likelihood, not just theoretical severity. For the full research behind this tutorial, including SHAP analysis and adversarial robustness evaluation, see Why CVSS Gets It Wrong. ...

How to Detect Backdoored ML Models Without Labeled Examples

Problem Statement Pre-trained models from public registries can pass every accuracy benchmark while hiding backdoors that activate only on attacker-chosen trigger inputs. Static analysis tools miss these because the backdoor lives in learned weights, not code. In 150 detection runs across 6 methods, Local Outlier Factor on raw activations achieved 0.622 AUROC at detecting backdoored models with zero labeled examples — modest but above chance, and the best unsupervised result I measured. ...

How to Red-Team Your AI Agent in 1 Hour

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. Problem Statement You are deploying an AI agent that can read files, search the web, or call APIs on behalf of users. Before you ship it, you need to know: what happens when someone tries to make it do something it should not? Existing frameworks like OWASP LLM Top 10 cover the language model layer, but agents have attack surfaces that models do not – tool orchestration, multi-step reasoning, persistent memory, and cross-agent delegation. You need a systematic way to test these surfaces. ...

I Built a PQC Migration Scanner: Here's What Your Codebase Is Hiding

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I scanned Python’s standard library for quantum-vulnerable cryptography. Found 39 findings — 19 critical, all Shor-vulnerable. Then I trained ML models on 21,142 crypto-related CVEs to score migration priority. The surprise: classical exploit risk matters more than quantum vulnerability for deciding what to fix first. And 70% of the crypto in your codebase isn’t yours to change. ...

Prompt Injection Is Yesterday's Threat. RL Attacks Are Next.

Thesis: The security community is focused on prompt injection, but RL-specific attacks — reward poisoning, observation perturbation, policy extraction — are more dangerous and less understood. Prompt injection is real. I’ve tested it. In my agent red-teaming research, direct prompt injection achieved 80% success against default-configured LangChain ReAct agents. Reasoning chain hijacking hit 100%. These are serious vulnerabilities. But prompt injection is also becoming yesterday’s threat — it’s well-characterized, actively mitigated, and architecturally bounded. The attacks that should keep agent deployers awake are the ones that don’t touch the prompt at all. ...

The Agent Security Gap Nobody's Talking About: Skills Run Every Heartbeat

Thesis: Everyone’s worried about prompt injection, but the real agent attack surface is third-party skills — they execute persistently on every heartbeat cycle, not once per conversation. I keep having the same conversation. Someone asks about agent security. I say “third-party skills.” They say “you mean prompt injection?” No. I mean the code that runs inside your agent 144 times per day, with full access to your agent’s memory, context, and credentials, that you installed from a marketplace where one in five entries is actively malicious. ...

Why AI-Powered Attacks Need Architecture-Level Defense

Thesis: Point solutions — WAFs, signature-based antivirus, rule-based SIEMs — fail against AI-powered attacks because AI attacks adapt faster than signatures update. The defense must be architectural. I’ve spent the last four months building and attacking ML-based security systems across six domains. The consistent finding is that the model you choose matters far less than the architecture you deploy it in. A well-architected defense with a mediocre model beats an unstructured defense with a state-of-the-art model — across all six domains I tested. ...