The search functionality is under construction.

Author Search Result

[Author] Daiki CHIBA(9hit)

1-9hit
  • Efficient Dynamic Malware Analysis for Collecting HTTP Requests using Deep Learning

    Toshiki SHIBAHARA  Takeshi YAGI  Mitsuaki AKIYAMA  Daiki CHIBA  Kunio HATO  

     
    PAPER

      Pubricized:
    2019/02/01
      Vol:
    E102-D No:4
      Page(s):
    725-736

    Malware-infected hosts have typically been detected using network-based Intrusion Detection Systems on the basis of characteristic patterns of HTTP requests collected with dynamic malware analysis. Since attackers continuously modify malicious HTTP requests to evade detection, novel HTTP requests sent from new malware samples need to be exhaustively collected in order to maintain a high detection rate. However, analyzing all new malware samples for a long period is infeasible in a limited amount of time. Therefore, we propose a system for efficiently collecting HTTP requests with dynamic malware analysis. Specifically, our system analyzes a malware sample for a short period and then determines whether the analysis should be continued or suspended. Our system identifies malware samples whose analyses should be continued on the basis of the network behavior in their short-period analyses. To make an accurate determination, we focus on the fact that malware communications resemble natural language from the viewpoint of data structure. We apply the recursive neural network, which has recently exhibited high classification performance in the field of natural language processing, to our proposed system. In the evaluation with 42,856 malware samples, our proposed system collected 94% of novel HTTP requests and reduced analysis time by 82% in comparison with the system that continues all analyses.

  • Event De-Noising Convolutional Neural Network for Detecting Malicious URL Sequences from Proxy Logs

    Toshiki SHIBAHARA  Kohei YAMANISHI  Yuta TAKATA  Daiki CHIBA  Taiga HOKAGUCHI  Mitsuaki AKIYAMA  Takeshi YAGI  Yuichi OHSITA  Masayuki MURATA  

     
    PAPER-Cryptography and Information Security

      Vol:
    E101-A No:12
      Page(s):
    2149-2161

    The number of infected hosts on enterprise networks has been increased by drive-by download attacks. In these attacks, users of compromised popular websites are redirected toward websites that exploit vulnerabilities of a browser and its plugins. To prevent damage, detection of infected hosts on the basis of proxy logs rather than blacklist-based filtering has started to be researched. This is because blacklists have become difficult to create due to the short lifetime of malicious domains and concealment of exploit code. To detect accesses to malicious websites from proxy logs, we propose a system for detecting malicious URL sequences on the basis of three key ideas: focusing on sequences of URLs that include artifacts of malicious redirections, designing new features related to software other than browsers, and generating new training data with data augmentation. To find an effective approach for classifying URL sequences, we compared three approaches: an individual-based approach, a convolutional neural network (CNN), and our new event de-noising CNN (EDCNN). Our EDCNN reduces the negative effects of benign URLs redirected from compromised websites included in malicious URL sequences. Evaluation results show that only our EDCNN with proposed features and data augmentation achieved a practical classification performance: a true positive rate of 99.1%, and a false positive rate of 3.4%.

  • Detecting and Understanding Online Advertising Fraud in the Wild

    Fumihiro KANEI  Daiki CHIBA  Kunio HATO  Katsunari YOSHIOKA  Tsutomu MATSUMOTO  Mitsuaki AKIYAMA  

     
    PAPER-Network and System Security

      Pubricized:
    2020/03/24
      Vol:
    E103-D No:7
      Page(s):
    1512-1523

    While the online advertisement is widely used on the web and on mobile applications, the monetary damages by advertising frauds (ad frauds) have become a severe problem. Countermeasures against ad frauds are evaded since they rely on noticeable features (e.g., burstiness of ad requests) that attackers can easily change. We propose an ad-fraud-detection method that leverages robust features against attacker evasion. We designed novel features on the basis of the statistics observed in an ad network calculated from a large amount of ad requests from legitimate users, such as the popularity of publisher websites and the tendencies of client environments. We assume that attackers cannot know of or manipulate these statistics and that features extracted from fraudulent ad requests tend to be outliers. These features are used to construct a machine-learning model for detecting fraudulent ad requests. We evaluated our proposed method by using ad-request logs observed within an actual ad network. The results revealed that our designed features improved the recall rate by 10% and had about 100,000-160,000 fewer false negatives per day than conventional features based on the burstiness of ad requests. In addition, by evaluating detection performance with long-term dataset, we confirmed that the proposed method is robust against performance degradation over time. Finally, we applied our proposed method to a large dataset constructed on an ad network and found several characteristics of the latest ad frauds in the wild, for example, a large amount of fraudulent ad requests is sent from cloud servers.

  • Time-Series Measurement of Parked Domain Names and Their Malicious Uses

    Takayuki TOMATSURI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2021/01/08
      Vol:
    E104-B No:7
      Page(s):
    770-780

    On the Internet, there are lots of unused domain names that are not used for any actual services. Domain parking is a monetization mechanism for displaying online advertisements in such unused domain names. Some domain names used in cyber attacks are known to leverage domain parking services after the attack. However, the temporal relationships between domain parking services and malicious domain names have not been studied well. In this study, we investigated how malicious domain names using domain parking services change over time. We conducted a large-scale measurement study of more than 66.8 million domain names that have used domain parking services in the past 19 months. We reveal the existence of 3,964 domain names that have been malicious after using domain parking. We further identify what types of malicious activities (e.g., phishing and malware) such malicious domain names tend to be used for. We also reveal the existence of 3.02 million domain names that utilized multiple parking services simultaneously or while switching between them. Our study can contribute to the efficient analysis of malicious domain names using domain parking services.

  • To Get Lost is to Learn the Way: An Analysis of Multi-Step Social Engineering Attacks on the Web Open Access

    Takashi KOIDE  Daiki CHIBA  Mitsuaki AKIYAMA  Katsunari YOSHIOKA  Tsutomu MATSUMOTO  

     
    PAPER

      Vol:
    E104-A No:1
      Page(s):
    162-181

    Web-based social engineering (SE) attacks manipulate users to perform specific actions, such as downloading malware and exposing personal information. Aiming to effectively lure users, some SE attacks, which we call multi-step SE attacks, constitute a sequence of web pages starting from a landing page and require browser interactions at each web page. Also, different browser interactions executed on a web page often branch to multiple sequences to redirect users to different SE attacks. Although common systems analyze only landing pages or conduct browser interactions limited to a specific attack, little effort has been made to follow such sequences of web pages to collect multi-step SE attacks. We propose STRAYSHEEP, a system to automatically crawl a sequence of web pages and detect diverse multi-step SE attacks. We evaluate the effectiveness of STRAYSHEEP's three modules (landing-page-collection, web-crawling, and SE-detection) in terms of the rate of collected landing pages leading to SE attacks, efficiency of web crawling to reach more SE attacks, and accuracy in detecting the attacks. Our experimental results indicate that STRAYSHEEP can lead to 20% more SE attacks than Alexa top sites and search results of trend words, crawl five times more efficiently than a simple crawling module, and detect SE attacks with 95.5% accuracy. We demonstrate that STRAYSHEEP can collect various SE attacks, not limited to a specific attack. We also clarify attackers' techniques for tricking users and browser interactions, redirecting users to attacks.

  • Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

    Naoki FUKUSHI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2019/10/08
      Vol:
    E103-B No:4
      Page(s):
    375-388

    In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.

  • Understanding Characteristics of Phishing Reports from Experts and Non-Experts on Twitter Open Access

    Hiroki NAKANO  Daiki CHIBA  Takashi KOIDE  Naoki FUKUSHI  Takeshi YAGI  Takeo HARIU  Katsunari YOSHIOKA  Tsutomu MATSUMOTO  

     
    PAPER-Information Network

      Pubricized:
    2024/03/01
      Vol:
    E107-D No:7
      Page(s):
    807-824

    The increase in phishing attacks through email and short message service (SMS) has shown no signs of deceleration. The first thing we need to do to combat the ever-increasing number of phishing attacks is to collect and characterize more phishing cases that reach end users. Without understanding these characteristics, anti-phishing countermeasures cannot evolve. In this study, we propose an approach using Twitter as a new observation point to immediately collect and characterize phishing cases via e-mail and SMS that evade countermeasures and reach users. Specifically, we propose CrowdCanary, a system capable of structurally and accurately extracting phishing information (e.g., URLs and domains) from tweets about phishing by users who have actually discovered or encountered it. In our three months of live operation, CrowdCanary identified 35,432 phishing URLs out of 38,935 phishing reports. We confirmed that 31,960 (90.2%) of these phishing URLs were later detected by the anti-virus engine, demonstrating that CrowdCanary is superior to existing systems in both accuracy and volume of threat extraction. We also analyzed users who shared phishing threats by utilizing the extracted phishing URLs and categorized them into two distinct groups - namely, experts and non-experts. As a result, we found that CrowdCanary could collect information that is specifically included in non-expert reports, such as information shared only by the company brand name in the tweet, information about phishing attacks that we find only in the image of the tweet, and information about the landing page before the redirect. Furthermore, we conducted a detailed analysis of the collected information on phishing sites and discovered that certain biases exist in the domain names and hosting servers of phishing sites, revealing new characteristics useful for unknown phishing site detection.

  • BotProfiler: Detecting Malware-Infected Hosts by Profiling Variability of Malicious Infrastructure Open Access

    Daiki CHIBA  Takeshi YAGI  Mitsuaki AKIYAMA  Kazufumi AOKI  Takeo HARIU  Shigeki GOTO  

     
    PAPER

      Vol:
    E99-B No:5
      Page(s):
    1012-1023

    Ever-evolving malware makes it difficult to prevent it from infecting hosts. Botnets in particular are one of the most serious threats to cyber security, since they consist of a lot of malware-infected hosts. Many countermeasures against malware infection, such as generating network-based signatures or templates, have been investigated. Such templates are designed to introduce regular expressions to detect polymorphic attacks conducted by attackers. A potential problem with such templates, however, is that they sometimes falsely regard benign communications as malicious, resulting in false positives, due to an inherent aspect of regular expressions. Since the cost of responding to malware infection is quite high, the number of false positives should be kept to a minimum. Therefore, we propose a system to generate templates that cause fewer false positives than a conventional system in order to achieve more accurate detection of malware-infected hosts. We focused on the key idea that malicious infrastructures, such as malware samples or command and control, tend to be reused instead of created from scratch. Our research verifies this idea and proposes here a new system to profile the variability of substrings in HTTP requests, which makes it possible to identify invariable keywords based on the same malicious infrastructures and to generate more accurate templates. The results of implementing our system and validating it using real traffic data indicate that it reduced false positives by up to two-thirds compared to the conventional system and even increased the detection rate of infected hosts.

  • DomainScouter: Analyzing the Risks of Deceptive Internationalized Domain Names

    Daiki CHIBA  Ayako AKIYAMA HASEGAWA  Takashi KOIDE  Yuta SAWABE  Shigeki GOTO  Mitsuaki AKIYAMA  

     
    PAPER-Network and System Security

      Pubricized:
    2020/03/19
      Vol:
    E103-D No:7
      Page(s):
    1493-1511

    Internationalized domain names (IDNs) are abused to create domain names that are visually similar to those of legitimate/popular brands. In this work, we systematize such domain names, which we call deceptive IDNs, and analyze the risks associated with them. In particular, we propose a new system called DomainScouter to detect various deceptive IDNs and calculate a deceptive IDN score, a new metric indicating the number of users that are likely to be misled by a deceptive IDN. We perform a comprehensive measurement study on the identified deceptive IDNs using over 4.4 million registered IDNs under 570 top-level domains (TLDs). The measurement results demonstrate that there are many previously unexplored deceptive IDNs targeting non-English brands or combining other domain squatting methods. Furthermore, we conduct online surveys to examine and highlight vulnerabilities in user perceptions when encountering such IDNs. Finally, we discuss the practical countermeasures that stakeholders can take against deceptive IDNs.