1-10hit |
Yuta TAKATA Hiroshi KUMAGAI Masaki KAMIZONO
While websites are becoming more and more complex daily, the difficulty of managing them is also increasing. It is important to conduct regular maintenance against these complex websites to strengthen their security and improve their cyber resilience. However, misconfigurations and vulnerabilities are still being discovered on some pages of websites and cyberattacks against them are never-ending. In this paper, we take the novel approach of applying the concept of security governance to websites; and, as part of this, measuring the consistency of software settings and versions used on these websites. More precisely, we analyze multiple web pages with the same domain name and identify differences in the security settings of HTTP headers and versions of software among them. After analyzing over 8,000 websites of popular global organizations, our measurement results show that over half of the tested websites exhibit differences. For example, we found websites running on a web server whose version changes depending on access and using a JavaScript library with different versions across over half of the tested pages. We identify the cause of such governance failures and propose improvement plans.
Toshiki SHIBAHARA Kohei YAMANISHI Yuta TAKATA Daiki CHIBA Taiga HOKAGUCHI Mitsuaki AKIYAMA Takeshi YAGI Yuichi OHSITA Masayuki MURATA
The number of infected hosts on enterprise networks has been increased by drive-by download attacks. In these attacks, users of compromised popular websites are redirected toward websites that exploit vulnerabilities of a browser and its plugins. To prevent damage, detection of infected hosts on the basis of proxy logs rather than blacklist-based filtering has started to be researched. This is because blacklists have become difficult to create due to the short lifetime of malicious domains and concealment of exploit code. To detect accesses to malicious websites from proxy logs, we propose a system for detecting malicious URL sequences on the basis of three key ideas: focusing on sequences of URLs that include artifacts of malicious redirections, designing new features related to software other than browsers, and generating new training data with data augmentation. To find an effective approach for classifying URL sequences, we compared three approaches: an individual-based approach, a convolutional neural network (CNN), and our new event de-noising CNN (EDCNN). Our EDCNN reduces the negative effects of benign URLs redirected from compromised websites included in malicious URL sequences. Evaluation results show that only our EDCNN with proposed features and data augmentation achieved a practical classification performance: a true positive rate of 99.1%, and a false positive rate of 3.4%.
Yumehisa HAGA Yuta TAKATA Mitsuaki AKIYAMA Tatsuya MORI
Web tracking is widely used as a means to track user's behavior on websites. While web tracking provides new opportunities of e-commerce, it also includes certain risks such as privacy infringement. Therefore, analyzing such risks in the wild Internet is meaningful to make the user's privacy transparent. This work aims to understand how the web tracking has been adopted to prominent websites. We also aim to understand their resilience to the ad-blocking techniques. Web tracking-enabled websites collect the information called the web browser fingerprints, which can be used to identify users. We develop a scalable system that can detect fingerprinting by using both dynamic and static analyses. If a tracking site makes use of many and strong fingerprints, the site is likely resilient to the ad-blocking techniques. We also analyze the connectivity of the third-party tracking sites, which are linked from multiple websites. The link analysis allows us to extract the group of associated tracking sites and understand how influential these sites are. Based on the analyses of 100,000 websites, we quantify the potential risks of the web tracking-enabled websites. We reveal that there are 226 websites that adopt fingerprints that cannot be detected with the most of off-the-shelf anti-tracking tools. We also reveal that a major, resilient third-party tracking site is linked to 50.0 % of the top-100,000 popular websites.
Takuya WATANABE Mitsuaki AKIYAMA Fumihiro KANEI Eitaro SHIOJI Yuta TAKATA Bo SUN Yuta ISHII Toshiki SHIBAHARA Takeshi YAGI Tatsuya MORI
This paper reports a large-scale study that aims to understand how mobile application (app) vulnerabilities are associated with software libraries. We analyze both free and paid apps. Studying paid apps was quite meaningful because it helped us understand how differences in app development/maintenance affect the vulnerabilities associated with libraries. We analyzed 30k free and paid apps collected from the official Android marketplace. Our extensive analyses revealed that approximately 70%/50% of vulnerabilities of free/paid apps stem from software libraries, particularly from third-party libraries. Somewhat paradoxically, we found that more expensive/popular paid apps tend to have more vulnerabilities. This comes from the fact that more expensive/popular paid apps tend to have more functionality, i.e., more code and libraries, which increases the probability of vulnerabilities. Based on our findings, we provide suggestions to stakeholders of mobile app distribution ecosystems.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeshi YADA Shigeki GOTO
An incident response organization such as a CSIRT contributes to preventing the spread of malware infection by analyzing compromised websites and sending abuse reports with detected URLs to webmasters. However, these abuse reports with only URLs are not sufficient to clean up the websites. In addition, it is difficult to analyze malicious websites across different client environments because these websites change behavior depending on a client environment. To expedite compromised website clean-up, it is important to provide fine-grained information such as malicious URL relations, the precise position of compromised web content, and the target range of client environments. In this paper, we propose a new method of constructing a redirection graph with context, such as which web content redirects to malicious websites. The proposed method analyzes a website in a multi-client environment to identify which client environment is exposed to threats. We evaluated our system using crawling datasets of approximately 2,000 compromised websites. The result shows that our system successfully identified malicious URL relations and compromised web content, and the number of URLs and the amount of web content to be analyzed were sufficient for incident responders by 15.0% and 0.8%, respectively. Furthermore, it can also identify the target range of client environments in 30.4% of websites and a vulnerability that has been used in malicious websites by leveraging target information. This fine-grained analysis by our system would contribute to improving the daily work of incident responders.
Hiroki NAKANO Fumihiro KANEI Yuta TAKATA Mitsuaki AKIYAMA Katsunari YOSHIOKA
Android app developers sometimes copy code snippets posted on a question-and-answer (Q&A) website and use them in their apps. However, if a code snippet has vulnerabilities, Android apps containing the vulnerable snippet could also have the same vulnerabilities. Despite this, the effect of such vulnerable snippets on the Android apps has not been investigated in depth. In this paper, we investigate the correspondence between the vulnerable code snippets and vulnerable apps. we collect code snippets from a Q&A website, extract possibly vulnerable snippets, and calculate similarity between those snippets and bytecode on vulnerable apps. Our experimental results show that 15.8% of all evaluated apps that have SSL implementation vulnerabilities (Improper host name verification), 31.7% that have SSL certificate verification vulnerabilities, and 3.8% that have WEBVIEW remote code execution vulnerabilities contain possibly vulnerable code snippets from Stack Overflow. In the worst case, a single problematic snippet has caused 4,844 apps to contain a vulnerability, accounting for 31.2% of all collected apps with that vulnerability.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeo HARIU Kazuhiko OHKUBO Shigeki GOTO
Security researchers/vendors detect malicious websites based on several website features extracted by honeyclient analysis. However, web-based attacks continue to be more sophisticated along with the development of countermeasure techniques. Attackers detect the honeyclient and evade analysis using sophisticated JavaScript code. The evasive code indirectly identifies vulnerable clients by abusing the differences among JavaScript implementations. Attackers deliver malware only to targeted clients on the basis of the evasion results while avoiding honeyclient analysis. Therefore, we are faced with a problem in that honeyclients cannot analyze malicious websites. Nevertheless, we can observe the evasion nature, i.e., the results in accessing malicious websites by using targeted clients are different from those by using honeyclients. In this paper, we propose a method of extracting evasive code by leveraging the above differences to investigate current evasion techniques. Our method analyzes HTTP transactions of the same website obtained using two types of clients, a real browser as a targeted client and a browser emulator as a honeyclient. As a result of evaluating our method with 8,467 JavaScript samples executed in 20,272 malicious websites, we discovered previously unknown evasion techniques that abuse the differences among JavaScript implementations. These findings will contribute to improving the analysis capabilities of conventional honeyclients.
Kenta NOMURA Yuta TAKATA Hiroshi KUMAGAI Masaki KAMIZONO Yoshiaki SHIRAISHI Masami MOHRI Masakatu MORII
The proliferation of coronavirus disease (COVID-19) has prompted changes in business models. To ensure a successful transition to non-face-to-face and electronic communication, the authenticity of data and the trustworthiness of communication partners are essential. Trust services provide a mechanism for preventing data falsification and spoofing. To develop a trust service, the characteristics of the service and the scope of its use need to be determined, and the relevant legal systems must be investigated. Preparing a document to meet trust service provider requirements may incur significant expenses. This study focuses on electronic signatures, proposes criteria for classification, classifies actual documents based on these criteria, and opens a discussion. A case study illustrates how trusted service providers search a document highlighting areas that require approval. The classification table in this paper may prove advantageous at the outset when business decisions are uncertain, and there is no clear starting point.
Toshiki SHIBAHARA Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Kunio HATO Masayuki MURATA
Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeo HARIU Shigeki GOTO
Drive-by download attacks force users to automatically download and install malware by redirecting them to malicious URLs that exploit vulnerabilities of the user's web browser. In addition, several evasion techniques, such as code obfuscation and environment-dependent redirection, are used in combination with drive-by download attacks to prevent detection. In environment-dependent redirection, attackers profile the information on the user's environment, such as the name and version of the browser and browser plugins, and launch a drive-by download attack on only certain targets by changing the destination URL. When malicious content detection and collection techniques, such as honeyclients, are used that do not match the specific environment of the attack target, they cannot detect the attack because they are not redirected. Therefore, it is necessary to improve analysis coverage while countering these adversarial evasion techniques. We propose a method for exhaustively analyzing JavaScript code relevant to redirections and extracting the destination URLs in the code. Our method facilitates the detection of attacks by extracting a large number of URLs while controlling the analysis overhead by excluding code not relevant to redirections. We implemented our method in a browser emulator called MINESPIDER that automatically extracts potential URLs from websites. We validated it by using communication data with malicious websites captured during a three-year period. The experimental results demonstrated that MINESPIDER extracted 30,000 new URLs from malicious websites in a few seconds that conventional methods missed.