Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing

IEICE TRANSACTIONS on Information

Open Access
Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing

Kensuke SUMOTO, Kenta KANAKOGI, Hironori WASHIZAKI, Naohiko TSUDA, Nobukazu YOSHIOKA, Yoshiaki FUKAZAWA, Hideyuki KANUKA

Full Text Views

139

Cite this

Free PDF (11MB)

Summary :

Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.

Publication: IEICE TRANSACTIONS on Information Vol.E107-D No.5 pp.674-682

Publication Date: 2024/05/01

Publicized: 2024/02/09

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023DAP0013

Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)

Category

Authors

Kensuke SUMOTO
  Waseda University
Kenta KANAKOGI
  Waseda University
Hironori WASHIZAKI
  Waseda University
Naohiko TSUDA
  Waseda University
Nobukazu YOSHIOKA
  Waseda University
Yoshiaki FUKAZAWA
  Waseda University
Hideyuki KANUKA
  Hitachi, Ltd.

Keyword

technology, security knowledge repository, CVE, BERT, natural language processing, named entity recognition

Cite this

Copy

Kensuke SUMOTO, Kenta KANAKOGI, Hironori WASHIZAKI, Naohiko TSUDA, Nobukazu YOSHIOKA, Yoshiaki FUKAZAWA, Hideyuki KANUKA, "Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing" in IEICE TRANSACTIONS on Information, vol. E107-D, no. 5, pp. 674-682, May 2024, doi: 10.1587/transinf.2023DAP0013.
Abstract: Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023DAP0013/_f

Copy

@ARTICLE{e107-d_5_674,
author={Kensuke SUMOTO, Kenta KANAKOGI, Hironori WASHIZAKI, Naohiko TSUDA, Nobukazu YOSHIOKA, Yoshiaki FUKAZAWA, Hideyuki KANUKA, },
journal={IEICE TRANSACTIONS on Information},
title={Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing},
year={2024},
volume={E107-D},
number={5},
pages={674-682},
abstract={Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.},
keywords={},
doi={10.1587/transinf.2023DAP0013},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing
T2 - IEICE TRANSACTIONS on Information
SP - 674
EP - 682
AU - Kensuke SUMOTO
AU - Kenta KANAKOGI
AU - Hironori WASHIZAKI
AU - Naohiko TSUDA
AU - Nobukazu YOSHIOKA
AU - Yoshiaki FUKAZAWA
AU - Hideyuki KANUKA
PY - 2024
DO - 10.1587/transinf.2023DAP0013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2024
AB - Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
ER -

IEICE TRANSACTIONS on Information

Open Access
Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Open AccessAutomated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Open Access
Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing