The search functionality is under construction.
The search functionality is under construction.

Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

Naoki FUKUSHI, Daiki CHIBA, Mitsuaki AKIYAMA, Masato UCHIDA

  • Full Text Views

    0

  • Cite this

Summary :

In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.

Publication
IEICE TRANSACTIONS on Communications Vol.E103-B No.4 pp.375-388
Publication Date
2020/04/01
Publicized
2019/10/08
Online ISSN
1745-1345
DOI
10.1587/transcom.2019NRP0005
Type of Manuscript
Special Section PAPER (Special Section on Network Resource Control and Management Technologies for Sustainable Social Information Infrastructure)
Category

Authors

Naoki FUKUSHI
  Waseda University
Daiki CHIBA
  NTT Corporation
Mitsuaki AKIYAMA
  NTT Corporation
Masato UCHIDA
  Waseda University

Keyword