Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities

Toshiki SHIBAHARA; Yuta TAKATA; Mitsuaki AKIYAMA; Takeshi YAGI; Kunio HATO; Masayuki MURATA

doi:10.1587/transinf.2018FCP0007

Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities

Toshiki SHIBAHARA, Yuta TAKATA, Mitsuaki AKIYAMA, Takeshi YAGI, Kunio HATO, Masayuki MURATA

Full Text Views

0

Cite this

Summary :

Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.3 pp.430-443

Publication Date: 2019/03/01

Publicized: 2018/10/30

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018FCP0007

Type of Manuscript: Special Section PAPER (Special Section on Foundations of Computer Science — Algorithm, Theory of Computation, and their Applications —)

Category

Authors

Toshiki SHIBAHARA
  NTT Secure Platform Laboratories,Osaka University
Yuta TAKATA
  NTT Secure Platform Laboratories
Mitsuaki AKIYAMA
  NTT Secure Platform Laboratories
Takeshi YAGI
  NTT Secure Platform Laboratories
Kunio HATO
  NTT Secure Platform Laboratories
Masayuki MURATA
  Osaka University

Keyword

drive-by download attack, browser fingerprinting, graph mining, clustering

Cite this

Copy

Toshiki SHIBAHARA, Yuta TAKATA, Mitsuaki AKIYAMA, Takeshi YAGI, Kunio HATO, Masayuki MURATA, "Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 3, pp. 430-443, March 2019, doi: 10.1587/transinf.2018FCP0007.
Abstract: Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018FCP0007/_p

Copy

@ARTICLE{e102-d_3_430,
author={Toshiki SHIBAHARA, Yuta TAKATA, Mitsuaki AKIYAMA, Takeshi YAGI, Kunio HATO, Masayuki MURATA, },
journal={IEICE TRANSACTIONS on Information},
title={Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities},
year={2019},
volume={E102-D},
number={3},
pages={430-443},
abstract={Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.},
keywords={},
doi={10.1587/transinf.2018FCP0007},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities
T2 - IEICE TRANSACTIONS on Information
SP - 430
EP - 443
AU - Toshiki SHIBAHARA
AU - Yuta TAKATA
AU - Mitsuaki AKIYAMA
AU - Takeshi YAGI
AU - Kunio HATO
AU - Masayuki MURATA
PY - 2019
DO - 10.1587/transinf.2018FCP0007
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2019
AB - Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.
ER -