Full Text Views
139
Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
Kensuke SUMOTO
Waseda University
Kenta KANAKOGI
Waseda University
Hironori WASHIZAKI
Waseda University
Naohiko TSUDA
Waseda University
Nobukazu YOSHIOKA
Waseda University
Yoshiaki FUKAZAWA
Waseda University
Hideyuki KANUKA
Hitachi, Ltd.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kensuke SUMOTO, Kenta KANAKOGI, Hironori WASHIZAKI, Naohiko TSUDA, Nobukazu YOSHIOKA, Yoshiaki FUKAZAWA, Hideyuki KANUKA, "Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 5, pp. 674-682, May 2024, doi: 10.1587/transinf.2023DAP0013.
Abstract: Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023DAP0013/_f
Copy
@ARTICLE{e107-d_5_674,
author={Kensuke SUMOTO, Kenta KANAKOGI, Hironori WASHIZAKI, Naohiko TSUDA, Nobukazu YOSHIOKA, Yoshiaki FUKAZAWA, Hideyuki KANUKA, },
journal={IEICE TRANSACTIONS on Information},
title={Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing},
year={2024},
volume={E107-D},
number={5},
pages={674-682},
abstract={Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.},
keywords={},
doi={10.1587/transinf.2023DAP0013},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing
T2 - IEICE TRANSACTIONS on Information
SP - 674
EP - 682
AU - Kensuke SUMOTO
AU - Kenta KANAKOGI
AU - Hironori WASHIZAKI
AU - Naohiko TSUDA
AU - Nobukazu YOSHIOKA
AU - Yoshiaki FUKAZAWA
AU - Hideyuki KANUKA
PY - 2024
DO - 10.1587/transinf.2023DAP0013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2024
AB - Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
ER -