1-3hit |
Toshiki SHIBAHARA Takeshi YAGI Mitsuaki AKIYAMA Daiki CHIBA Kunio HATO
Malware-infected hosts have typically been detected using network-based Intrusion Detection Systems on the basis of characteristic patterns of HTTP requests collected with dynamic malware analysis. Since attackers continuously modify malicious HTTP requests to evade detection, novel HTTP requests sent from new malware samples need to be exhaustively collected in order to maintain a high detection rate. However, analyzing all new malware samples for a long period is infeasible in a limited amount of time. Therefore, we propose a system for efficiently collecting HTTP requests with dynamic malware analysis. Specifically, our system analyzes a malware sample for a short period and then determines whether the analysis should be continued or suspended. Our system identifies malware samples whose analyses should be continued on the basis of the network behavior in their short-period analyses. To make an accurate determination, we focus on the fact that malware communications resemble natural language from the viewpoint of data structure. We apply the recursive neural network, which has recently exhibited high classification performance in the field of natural language processing, to our proposed system. In the evaluation with 42,856 malware samples, our proposed system collected 94% of novel HTTP requests and reduced analysis time by 82% in comparison with the system that continues all analyses.
Fumihiro KANEI Daiki CHIBA Kunio HATO Katsunari YOSHIOKA Tsutomu MATSUMOTO Mitsuaki AKIYAMA
While the online advertisement is widely used on the web and on mobile applications, the monetary damages by advertising frauds (ad frauds) have become a severe problem. Countermeasures against ad frauds are evaded since they rely on noticeable features (e.g., burstiness of ad requests) that attackers can easily change. We propose an ad-fraud-detection method that leverages robust features against attacker evasion. We designed novel features on the basis of the statistics observed in an ad network calculated from a large amount of ad requests from legitimate users, such as the popularity of publisher websites and the tendencies of client environments. We assume that attackers cannot know of or manipulate these statistics and that features extracted from fraudulent ad requests tend to be outliers. These features are used to construct a machine-learning model for detecting fraudulent ad requests. We evaluated our proposed method by using ad-request logs observed within an actual ad network. The results revealed that our designed features improved the recall rate by 10% and had about 100,000-160,000 fewer false negatives per day than conventional features based on the burstiness of ad requests. In addition, by evaluating detection performance with long-term dataset, we confirmed that the proposed method is robust against performance degradation over time. Finally, we applied our proposed method to a large dataset constructed on an ad network and found several characteristics of the latest ad frauds in the wild, for example, a large amount of fraudulent ad requests is sent from cloud servers.
Toshiki SHIBAHARA Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Kunio HATO Masayuki MURATA
Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.