Machine-Learning

A CFA Charterholder Built an ML Fraud Detector: Here's What the Models Miss

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I’m a CFA charterholder who builds ML systems. I trained XGBoost on 100K financial transactions to detect fraud — AUC 0.987. But the most interesting finding wasn’t the model performance. It was that CFA-informed rule-based scoring achieves 0.898 AUC on its own, and 8 of the top 20 predictive features come from domain expertise, not raw data. ...

Build Your Own ML Vuln Prioritizer

Problem Statement Your security team triages vulnerabilities by CVSS score. A 9.8 gets patched immediately; a 7.5 waits. But CVSS measures severity, not exploitability. In real-world data, CVSS achieves an AUC of just 0.662 at predicting which CVEs actually get exploited – barely better than a coin flip. You need a model that predicts exploitation likelihood, not just theoretical severity. For the full research behind this tutorial, including SHAP analysis and adversarial robustness evaluation, see Why CVSS Gets It Wrong. ...

EPSS alone outperforms all other vuln prediction features combined

In ablation testing of an ML vulnerability prioritization model, removing EPSS (Exploit Prediction Scoring System) dropped performance by 15.5 percentage points. No other single feature — not CVSS, not vendor, not CWE, not exploit availability — came close. EPSS alone carries more predictive signal than every other feature combined. Why this matters Most vulnerability management programs still use CVSS as their primary prioritization input. CVSS measures theoretical severity. EPSS measures observed exploitation probability. When you build an ML model that can use both (plus dozens of other features), EPSS dominates. This isn’t a marginal improvement — it’s a structural finding about where the real signal lives. ...

I Built a PQC Migration Scanner: Here's What Your Codebase Is Hiding

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. I scanned Python’s standard library for quantum-vulnerable cryptography. Found 39 findings — 19 critical, all Shor-vulnerable. Then I trained ML models on 21,142 crypto-related CVEs to score migration priority. The surprise: classical exploit risk matters more than quantum vulnerability for deciding what to fix first. And 70% of the crypto in your codebase isn’t yours to change. ...

Model choice matters less than feature controllability

Across adversarial ML experiments on network intrusion detection, the performance gap between the most and least robust models was less than 8%. The gap between high-controllability and low-controllability feature sets was over 40%. Model selection is a rounding error compared to feature architecture. Why this matters When teams build ML systems that face adversarial inputs — intrusion detection, fraud detection, spam filtering, malware classification — the default question is “which model is most robust?” That’s the wrong first question. The right first question is “which features does the attacker control?” ...

Antivirus for AI Models: Behavioral Fingerprinting Detects What Static Analysis Misses

A model poisoned through training data — one that behaves normally on 99.9% of inputs and activates a backdoor only on a specific trigger — passes every static analysis check. I built a behavioral fingerprinting system that detects these models using unsupervised anomaly detection: zero labeled backdoor examples, no model retraining, AUROC 0.62 on deliberately subtle synthetic backdoors. Static tools like ModelScan catch serialization exploits. Behavioral fingerprinting catches what static misses — and the defender controls the probe inputs, inverting the usual attacker advantage. This is a model supply chain problem analogous to the agent skill supply chain — in both cases, third-party artifacts execute inside your system and static analysis misses behavioral threats. ...

Adversarial ML on Network Intrusion Detection: What Adversarial Control Analysis Reveals

Note (2026-03-19): This was an early exploration in my AI security research. The methodology has known limitations documented in the quality assessment. For the current state of this work, see Multi-Agent Security and Verified Delegation Protocol. After studying how adversaries evade detection systems, I built one — then tried to break it. The finding that surprised me: the model architecture barely matters for robustness. What matters is which features the attacker can manipulate. ...

How I Govern AI-Assisted ML Projects

After four ML projects at Georgia Tech, I’d run 14 manual audit cycles with 30+ findings each. The governance wasn’t the problem — the manual enforcement was. So I built govML. The Problem Every ML project needs governance: reproducible experiments, documented decisions, data integrity checks, fair comparisons. But enforcing governance manually is a workflow killer. My unsupervised learning project had 7 audit cycles with 49+ findings. The RL project had 14 cycles with 30+ findings. I was spending more time auditing than experimenting. ...

Why CVSS Gets It Wrong: ML-Powered Vulnerability Prioritization

I trained an ML model on 338,000 real CVEs to find out what actually predicts exploitation in the wild. The answer: vendor deployment ubiquity and vulnerability age matter more than CVSS score. CVSS measures severity. Attackers measure opportunity. Teams patching CVSS 9.8 vulnerabilities that never get exploited — while CVSS 7.5s get weaponized — are following the wrong signal. The Data Three public data sources, joined by CVE ID: Source Records Purpose NVD (NIST) 337,953 CVEs Features: CVSS scores, CWE types, descriptions, vendor/product, references ExploitDB 24,936 CVEs with known exploits Ground truth label: “was this CVE actually exploited?” EPSS (First.org) 320,502 scores Baseline comparison: an existing ML-based prediction Temporal split: Train on pre-2024 CVEs (234,601), test on 2024+ (103,352). This prevents data leakage from future information — in production, you always predict on CVEs you haven’t seen yet. ...