Shap

I trained an ML model on 338,000 real CVEs to find out what actually predicts exploitation in the wild. The answer: vendor deployment ubiquity and vulnerability age matter more than CVSS score. CVSS measures severity. Attackers measure opportunity. Teams patching CVSS 9.8 vulnerabilities that never get exploited — while CVSS 7.5s get weaponized — are following the wrong signal. The Data Three public data sources, joined by CVE ID: Source Records Purpose NVD (NIST) 337,953 CVEs Features: CVSS scores, CWE types, descriptions, vendor/product, references ExploitDB 24,936 CVEs with known exploits Ground truth label: “was this CVE actually exploited?” EPSS (First.org) 320,502 scores Baseline comparison: an existing ML-based prediction Temporal split: Train on pre-2024 CVEs (234,601), test on 2024+ (103,352). This prevents data leakage from future information — in production, you always predict on CVEs you haven’t seen yet. ...

Build Your Own ML Vuln Prioritizer

Why CVSS Gets It Wrong: ML-Powered Vulnerability Prioritization