Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information

Yuki KAJIWARA; Junjun ZHENG; Koichi MOURI

doi:10.1587/transinf.2021EDP7067

Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information

Yuki KAJIWARA, Junjun ZHENG, Koichi MOURI

Full Text Views

0

Cite this

Summary :

The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.12 pp.2173-2183

Publication Date: 2021/12/01

Publicized: 2021/09/21

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDP7067

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Yuki KAJIWARA
  Ritsumeikan University
Junjun ZHENG
  Ritsumeikan University
Koichi MOURI
  Ritsumeikan University

Keyword

malware detection, machine learning, system calls, thread

Cite this

Copy

Yuki KAJIWARA, Junjun ZHENG, Koichi MOURI, "Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 12, pp. 2173-2183, December 2021, doi: 10.1587/transinf.2021EDP7067.
Abstract: The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7067/_p

Copy

@ARTICLE{e104-d_12_2173,
author={Yuki KAJIWARA, Junjun ZHENG, Koichi MOURI, },
journal={IEICE TRANSACTIONS on Information},
title={Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information},
year={2021},
volume={E104-D},
number={12},
pages={2173-2183},
abstract={The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.},
keywords={},
doi={10.1587/transinf.2021EDP7067},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information
T2 - IEICE TRANSACTIONS on Information
SP - 2173
EP - 2183
AU - Yuki KAJIWARA
AU - Junjun ZHENG
AU - Koichi MOURI
PY - 2021
DO - 10.1587/transinf.2021EDP7067
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2021
AB - The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.
ER -