Your AI Can't Beat EPSS at Vulnerability Triage (But the Ensemble Might)

Can an LLM agent prioritize vulnerabilities better than EPSS? Every security team drowning in CVEs wants to know whether AI can help them triage faster. We tested this empirically: Claude Haiku as a vulnerability triage agent, ranked against EPSS, CVSS, and random baselines, with CISA KEV as ground truth for “actually exploited.” The short answer: no, the agent doesn’t beat EPSS. But the longer answer is more interesting. ...

March 20, 2026 · 5 min · Rex Coleman

Build Your Own ML Vuln Prioritizer

Problem Statement Your security team triages vulnerabilities by CVSS score. A 9.8 gets patched immediately; a 7.5 waits. But CVSS measures severity, not exploitability. In real-world data, CVSS achieves an AUC of just 0.662 at predicting which CVEs actually get exploited – barely better than a coin flip. You need a model that predicts exploitation likelihood, not just theoretical severity. For the full research behind this tutorial, including SHAP analysis and adversarial robustness evaluation, see Why CVSS Gets It Wrong. ...

March 19, 2026 · 8 min · Rex Coleman

EPSS alone outperforms all other vuln prediction features combined

In ablation testing of an ML vulnerability prioritization model, removing EPSS (Exploit Prediction Scoring System) dropped performance by 15.5 percentage points. No other single feature — not CVSS, not vendor, not CWE, not exploit availability — came close. EPSS alone carries more predictive signal than every other feature combined. Why this matters Most vulnerability management programs still use CVSS as their primary prioritization input. CVSS measures theoretical severity. EPSS measures observed exploitation probability. When you build an ML model that can use both (plus dozens of other features), EPSS dominates. This isn’t a marginal improvement — it’s a structural finding about where the real signal lives. ...

March 19, 2026 · 2 min · Rex Coleman

Why CVSS Gets It Wrong: ML-Powered Vulnerability Prioritization

I trained an ML model on 338,000 real CVEs to find out what actually predicts exploitation in the wild. The answer: vendor deployment ubiquity and vulnerability age matter more than CVSS score. CVSS measures severity. Attackers measure opportunity. Teams patching CVSS 9.8 vulnerabilities that never get exploited — while CVSS 7.5s get weaponized — are following the wrong signal. The Data Three public data sources, joined by CVE ID: Source Records Purpose NVD (NIST) 337,953 CVEs Features: CVSS scores, CWE types, descriptions, vendor/product, references ExploitDB 24,936 CVEs with known exploits Ground truth label: “was this CVE actually exploited?” EPSS (First.org) 320,502 scores Baseline comparison: an existing ML-based prediction Temporal split: Train on pre-2024 CVEs (234,601), test on 2024+ (103,352). This prevents data leakage from future information — in production, you always predict on CVEs you haven’t seen yet. ...

March 14, 2026 · 6 min · Rex Coleman
© 2026 Rex Coleman. Content under CC BY 4.0. Code under MIT. GitHub · LinkedIn · Email