TY - CHAP
T1 - V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities
T2 - 8th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2021
AU - Das, Siddhartha Shankar
AU - Serra, Edoardo
AU - Halappanavar, Mahantesh
AU - Pothen, Alex
AU - Al-Shaer, Ehab
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - We consider the problem of automating the mapping of observed vulnerabilities in software listed in Common Vulnerabilities and Exposures (CVE) reports to weaknesses listed in Common Weakness Enumerations (CWE) reports, a hierarchically designed dictionary of software weaknesses. Mapping of CVEs to CWEs provides a means to understand how they might be exploited for malicious purposes, and to mitigate their impact. Since manual mapping of CVEs to CWEs is not a viable approach due to their ever-increasing sizes, automated approaches need to be devised but obtaining highly accurate mapping is a challenging problem. We present a novel Transformer-based learning framework (V2W-BERT) in this paper to solve this problem by bringing together ideas from natural language processing, link prediction and transfer learning. Our method outperforms previous approaches not only for CWE instances with abundant data to train, but also for rare CWE classes with little or no data. Using vulnerability and weakness reports from MITRE and the National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We demonstrate significant improvements in using historical data to predict weaknesses for future instances of CVEs. We believe that our work will would influence the design of better automated mapping approaches, and also that this technology could be deployed for more effective cybersecurity.
AB - We consider the problem of automating the mapping of observed vulnerabilities in software listed in Common Vulnerabilities and Exposures (CVE) reports to weaknesses listed in Common Weakness Enumerations (CWE) reports, a hierarchically designed dictionary of software weaknesses. Mapping of CVEs to CWEs provides a means to understand how they might be exploited for malicious purposes, and to mitigate their impact. Since manual mapping of CVEs to CWEs is not a viable approach due to their ever-increasing sizes, automated approaches need to be devised but obtaining highly accurate mapping is a challenging problem. We present a novel Transformer-based learning framework (V2W-BERT) in this paper to solve this problem by bringing together ideas from natural language processing, link prediction and transfer learning. Our method outperforms previous approaches not only for CWE instances with abundant data to train, but also for rare CWE classes with little or no data. Using vulnerability and weakness reports from MITRE and the National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We demonstrate significant improvements in using historical data to predict weaknesses for future instances of CVEs. We believe that our work will would influence the design of better automated mapping approaches, and also that this technology could be deployed for more effective cybersecurity.
KW - cyber-security
KW - databases
KW - dictionaries
KW - link prediction
KW - transformer
UR - https://scholarworks.boisestate.edu/cs_facpubs/311
UR - https://doi.org/10.1109/DSAA53316.2021.9564227
UR - http://www.scopus.com/inward/record.url?scp=85126096845&partnerID=8YFLogxK
U2 - 10.1109/DSAA53316.2021.9564227
DO - 10.1109/DSAA53316.2021.9564227
M3 - Chapter
T3 - 2021 IEEE 8th International Conference on Data Science and Advanced Analytics, DSAA 2021
BT - 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 October 2021 through 9 October 2021
ER -