The search functionality is under construction.

Author Search Result

[Author] Takeshi YAGI(19hit)

1-19hit
  • Design of Provider-Provisioned Website Protection Scheme against Malware Distribution

    Takeshi YAGI  Naoto TANIMOTO  Takeo HARIU  Mitsutaka ITOH  

     
    PAPER

      Vol:
    E93-B No:5
      Page(s):
    1122-1130

    Vulnerabilities in web applications expose computer networks to security threats, and many websites are used by attackers as hopping sites to attack other websites and user terminals. These incidents prevent service providers from constructing secure networking environments. To protect websites from attacks exploiting vulnerabilities in web applications, service providers use web application firewalls (WAFs). WAFs filter accesses from attackers by using signatures, which are generated based on the exploit codes of previous attacks. However, WAFs cannot filter unknown attacks because the signatures cannot reflect new types of attacks. In service provider environments, the number of exploit codes has recently increased rapidly because of the spread of vulnerable web applications that have been developed through cloud computing. Thus, generating signatures for all exploit codes is difficult. To solve these problems, our proposed scheme detects and filters malware downloads that are sent from websites which have already received exploit codes. In addition, to collect information for detecting malware downloads, web honeypots, which automatically extract the communication records of exploit codes, are used. According to the results of experiments using a prototype, our scheme can filter attacks automatically so that service providers can provide secure and cost-effective network environments.

  • Study on the Vulnerabilities of Free and Paid Mobile Apps Associated with Software Library

    Takuya WATANABE  Mitsuaki AKIYAMA  Fumihiro KANEI  Eitaro SHIOJI  Yuta TAKATA  Bo SUN  Yuta ISHII  Toshiki SHIBAHARA  Takeshi YAGI  Tatsuya MORI  

     
    PAPER-Network Security

      Pubricized:
    2019/11/22
      Vol:
    E103-D No:2
      Page(s):
    276-291

    This paper reports a large-scale study that aims to understand how mobile application (app) vulnerabilities are associated with software libraries. We analyze both free and paid apps. Studying paid apps was quite meaningful because it helped us understand how differences in app development/maintenance affect the vulnerabilities associated with libraries. We analyzed 30k free and paid apps collected from the official Android marketplace. Our extensive analyses revealed that approximately 70%/50% of vulnerabilities of free/paid apps stem from software libraries, particularly from third-party libraries. Somewhat paradoxically, we found that more expensive/popular paid apps tend to have more vulnerabilities. This comes from the fact that more expensive/popular paid apps tend to have more functionality, i.e., more code and libraries, which increases the probability of vulnerabilities. Based on our findings, we provide suggestions to stakeholders of mobile app distribution ecosystems.

  • Follow Your Silhouette: Identifying the Social Account of Website Visitors through User-Blocking Side Channel

    Takuya WATANABE  Eitaro SHIOJI  Mitsuaki AKIYAMA  Keito SASAOKA  Takeshi YAGI  Tatsuya MORI  

     
    PAPER-Network Security

      Pubricized:
    2019/11/11
      Vol:
    E103-D No:2
      Page(s):
    239-255

    This paper presents a practical side-channel attack that identifies the social web service account of a visitor to an attacker's website. Our attack leverages the widely adopted user-blocking mechanism, abusing its inherent property that certain pages return different web content depending on whether a user is blocked from another user. Our key insight is that an account prepared by an attacker can hold an attacker-controllable binary state of blocking/non-blocking with respect to an arbitrary user on the same service; provided that the user is logged in to the service, this state can be retrieved as one-bit data through the conventional cross-site timing attack when a user visits the attacker's website. We generalize and refer to such a property as visibility control, which we consider as the fundamental assumption of our attack. Building on this primitive, we show that an attacker with a set of controlled accounts can gain a complete and flexible control over the data leaked through the side channel. Using this mechanism, we show that it is possible to design and implement a robust, large-scale user identification attack on a wide variety of social web services. To verify the feasibility of our attack, we perform an extensive empirical study using 16 popular social web services and demonstrate that at least 12 of these are vulnerable to our attack. Vulnerable services include not only popular social networking sites such as Twitter and Facebook, but also other types of web services that provide social features, e.g., eBay and Xbox Live. We also demonstrate that the attack can achieve nearly 100% accuracy and can finish within a sufficiently short time in a practical setting. We discuss the fundamental principles, practical aspects, and limitations of the attack as well as possible defenses. We have successfully addressed this attack by collaborative working with service providers and browser vendors.

  • Evaluations and Analysis of Malware Prevention Methods on Websites

    Takeshi YAGI  Junichi MURAYAMA  Takeo HARIU  Hiroyuki OHSAKI  

     
    PAPER-Internet

      Vol:
    E96-B No:12
      Page(s):
    3091-3100

    With the diffusion of web services caused by the appearance of a new architecture known as cloud computing, a large number of websites have been used by attackers as hopping sites to attack other websites and user terminals because many vulnerable websites are constructed and managed by unskilled users. To construct hopping sites, many attackers force victims to download malware by using vulnerabilities in web applications. To protect websites from these malware infection attacks, conventional methods, such as using anti-virus software, filter files from attackers using pattern files generated by analyzing conventional malware files collected by security vendors. In addition, certain anti-virus software uses a behavior blocking approach, which monitors malicious file activities and modifications. These methods can detect malware files that are already known. However, it is difficult to detect malware that is different from known malware. It is also difficult to define malware since legitimate software files can become malicious depending on the situation. We previously proposed an access filtering method based on communication opponents, which are other servers or terminals that connect with our web honeypots, of attacks collected by web honeypots, which collect malware infection attacks to websites by using actual vulnerable web applications. In this blacklist-based method, URLs or IP addresses, which are used in malware infection attacks collected by web honeypots, are listed in a blacklist, and accesses to and from websites are filtered based on the blacklist. To reveal the effects in an actual attack situation on the Internet, we evaluated the detection ratio of anti-virus software, our method, and a composite of both methods. Our evaluation revealed that anti-virus software detected approximately 50% of malware files, our method detected approximately 98% of attacks, and the composite of the two methods could detect approximately 99% of attacks.

  • Fine-Grained Analysis of Compromised Websites with Redirection Graphs and JavaScript Traces

    Yuta TAKATA  Mitsuaki AKIYAMA  Takeshi YAGI  Takeshi YADA  Shigeki GOTO  

     
    PAPER-Internet Security

      Pubricized:
    2017/05/18
      Vol:
    E100-D No:8
      Page(s):
    1714-1728

    An incident response organization such as a CSIRT contributes to preventing the spread of malware infection by analyzing compromised websites and sending abuse reports with detected URLs to webmasters. However, these abuse reports with only URLs are not sufficient to clean up the websites. In addition, it is difficult to analyze malicious websites across different client environments because these websites change behavior depending on a client environment. To expedite compromised website clean-up, it is important to provide fine-grained information such as malicious URL relations, the precise position of compromised web content, and the target range of client environments. In this paper, we propose a new method of constructing a redirection graph with context, such as which web content redirects to malicious websites. The proposed method analyzes a website in a multi-client environment to identify which client environment is exposed to threats. We evaluated our system using crawling datasets of approximately 2,000 compromised websites. The result shows that our system successfully identified malicious URL relations and compromised web content, and the number of URLs and the amount of web content to be analyzed were sufficient for incident responders by 15.0% and 0.8%, respectively. Furthermore, it can also identify the target range of client environments in 30.4% of websites and a vulnerability that has been used in malicious websites by leveraging target information. This fine-grained analysis by our system would contribute to improving the daily work of incident responders.

  • Identifying Evasive Code in Malicious Websites by Analyzing Redirection Differences

    Yuta TAKATA  Mitsuaki AKIYAMA  Takeshi YAGI  Takeo HARIU  Kazuhiko OHKUBO  Shigeki GOTO  

     
    PAPER-Mobile Application and Web Security

      Pubricized:
    2018/08/22
      Vol:
    E101-D No:11
      Page(s):
    2600-2611

    Security researchers/vendors detect malicious websites based on several website features extracted by honeyclient analysis. However, web-based attacks continue to be more sophisticated along with the development of countermeasure techniques. Attackers detect the honeyclient and evade analysis using sophisticated JavaScript code. The evasive code indirectly identifies vulnerable clients by abusing the differences among JavaScript implementations. Attackers deliver malware only to targeted clients on the basis of the evasion results while avoiding honeyclient analysis. Therefore, we are faced with a problem in that honeyclients cannot analyze malicious websites. Nevertheless, we can observe the evasion nature, i.e., the results in accessing malicious websites by using targeted clients are different from those by using honeyclients. In this paper, we propose a method of extracting evasive code by leveraging the above differences to investigate current evasion techniques. Our method analyzes HTTP transactions of the same website obtained using two types of clients, a real browser as a targeted client and a browser emulator as a honeyclient. As a result of evaluating our method with 8,467 JavaScript samples executed in 20,272 malicious websites, we discovered previously unknown evasion techniques that abuse the differences among JavaScript implementations. These findings will contribute to improving the analysis capabilities of conventional honeyclients.

  • Automating URL Blacklist Generation with Similarity Search Approach

    Bo SUN  Mitsuaki AKIYAMA  Takeshi YAGI  Mitsuhiro HATADA  Tatsuya MORI  

     
    PAPER-Web security

      Pubricized:
    2016/01/13
      Vol:
    E99-D No:4
      Page(s):
    873-882

    Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.

  • Wide-Band High-Bit-Rate WDM Transmission Line with Medial Dispersion Fiber (MDF)

    Kazunori MUKASA  Takeshi YAGI  Kunio KOKURA  

     
    LETTER

      Vol:
    E85-B No:2
      Page(s):
    484-486

    A novel optical transmission line consisted of fibers characterized by positive and negative medial dispersion of NZ-DSF and SMF was designed and fabricated. Both P-MDF and N-MDF have achieved the medial dispersion and low non-linearity simultaneously. Total characteristics were confirmed to be suitable for the future high-bit-rate transmission.

  • Client Honeypot Multiplication with High Performance and Precise Detection

    Mitsuaki AKIYAMA  Takeshi YAGI  Youki KADOBAYASHI  Takeo HARIU  Suguru YAMAGUCHI  

     
    PAPER-Attack Monitoring & Detection

      Vol:
    E98-D No:4
      Page(s):
    775-787

    We investigated client honeypots for detecting and circumstantially analyzing drive-by download attacks. A client honeypot requires both improved inspection performance and in-depth analysis for inspecting and discovering malicious websites. However, OS overhead in recent client honeypot operation cannot be ignored when improving honeypot multiplication performance. We propose a client honeypot system that is a combination of multi-OS and multi-process honeypot approaches, and we implemented this system to evaluate its performance. The process sandbox mechanism, a security measure for our multi-process approach, provides a virtually isolated environment for each web browser. It prevents system alteration from a compromised browser process by I/O redirection of file/registry access. To solve the inconsistency problem of file/registry view by I/O redirection, our process sandbox mechanism enables the web browser and corresponding plug-ins to share a virtual system view. Therefore, it enables multiple processes to be run simultaneously without interference behavior of processes on a single OS. In a field trial, we confirmed that the use of our multi-process approach was three or more times faster than that of a single process, and our multi-OS approach linearly improved system performance according to the number of honeypot instances. In addition, our long-term investigation indicated that 72.3% of exploitations target browser-helper processes. If a honeypot restricts all process creation events, it cannot identify an exploitation targeting a browser-helper process. In contrast, our process sandbox mechanism permits the creation of browser-helper processes, so it can identify these types of exploitations without resulting in false negatives. Thus, our proposed system with these multiplication approaches improves performance efficiency and enables in-depth analysis on high interaction systems.

  • Evasive Malicious Website Detection by Leveraging Redirection Subgraph Similarities

    Toshiki SHIBAHARA  Yuta TAKATA  Mitsuaki AKIYAMA  Takeshi YAGI  Kunio HATO  Masayuki MURATA  

     
    PAPER

      Pubricized:
    2018/10/30
      Vol:
    E102-D No:3
      Page(s):
    430-443

    Many users are exposed to threats of drive-by download attacks through the Web. Attackers compromise vulnerable websites discovered by search engines and redirect clients to malicious websites created with exploit kits. Security researchers and vendors have tried to prevent the attacks by detecting malicious data, i.e., malicious URLs, web content, and redirections. However, attackers conceal parts of malicious data with evasion techniques to circumvent detection systems. In this paper, we propose a system for detecting malicious websites without collecting all malicious data. Even if we cannot observe parts of malicious data, we can always observe compromised websites. Since vulnerable websites are discovered by search engines, compromised websites have similar traits. Therefore, we built a classifier by leveraging not only malicious but also compromised websites. More precisely, we convert all websites observed at the time of access into a redirection graph and classify it by integrating similarities between its subgraphs and redirection subgraphs shared across malicious, benign, and compromised websites. As a result of evaluating our system with crawling data of 455,860 websites, we found that the system achieved a 91.7% true positive rate for malicious websites containing exploit URLs at a low false positive rate of 0.1%. Moreover, it detected 143 more evasive malicious websites than the conventional content-based system.

  • Understanding Characteristics of Phishing Reports from Experts and Non-Experts on Twitter Open Access

    Hiroki NAKANO  Daiki CHIBA  Takashi KOIDE  Naoki FUKUSHI  Takeshi YAGI  Takeo HARIU  Katsunari YOSHIOKA  Tsutomu MATSUMOTO  

     
    PAPER-Information Network

      Pubricized:
    2024/03/01
      Vol:
    E107-D No:7
      Page(s):
    807-824

    The increase in phishing attacks through email and short message service (SMS) has shown no signs of deceleration. The first thing we need to do to combat the ever-increasing number of phishing attacks is to collect and characterize more phishing cases that reach end users. Without understanding these characteristics, anti-phishing countermeasures cannot evolve. In this study, we propose an approach using Twitter as a new observation point to immediately collect and characterize phishing cases via e-mail and SMS that evade countermeasures and reach users. Specifically, we propose CrowdCanary, a system capable of structurally and accurately extracting phishing information (e.g., URLs and domains) from tweets about phishing by users who have actually discovered or encountered it. In our three months of live operation, CrowdCanary identified 35,432 phishing URLs out of 38,935 phishing reports. We confirmed that 31,960 (90.2%) of these phishing URLs were later detected by the anti-virus engine, demonstrating that CrowdCanary is superior to existing systems in both accuracy and volume of threat extraction. We also analyzed users who shared phishing threats by utilizing the extracted phishing URLs and categorized them into two distinct groups - namely, experts and non-experts. As a result, we found that CrowdCanary could collect information that is specifically included in non-expert reports, such as information shared only by the company brand name in the tweet, information about phishing attacks that we find only in the image of the tweet, and information about the landing page before the redirect. Furthermore, we conducted a detailed analysis of the collected information on phishing sites and discovered that certain biases exist in the domain names and hosting servers of phishing sites, revealing new characteristics useful for unknown phishing site detection.

  • BotProfiler: Detecting Malware-Infected Hosts by Profiling Variability of Malicious Infrastructure Open Access

    Daiki CHIBA  Takeshi YAGI  Mitsuaki AKIYAMA  Kazufumi AOKI  Takeo HARIU  Shigeki GOTO  

     
    PAPER

      Vol:
    E99-B No:5
      Page(s):
    1012-1023

    Ever-evolving malware makes it difficult to prevent it from infecting hosts. Botnets in particular are one of the most serious threats to cyber security, since they consist of a lot of malware-infected hosts. Many countermeasures against malware infection, such as generating network-based signatures or templates, have been investigated. Such templates are designed to introduce regular expressions to detect polymorphic attacks conducted by attackers. A potential problem with such templates, however, is that they sometimes falsely regard benign communications as malicious, resulting in false positives, due to an inherent aspect of regular expressions. Since the cost of responding to malware infection is quite high, the number of false positives should be kept to a minimum. Therefore, we propose a system to generate templates that cause fewer false positives than a conventional system in order to achieve more accurate detection of malware-infected hosts. We focused on the key idea that malicious infrastructures, such as malware samples or command and control, tend to be reused instead of created from scratch. Our research verifies this idea and proposes here a new system to profile the variability of substrings in HTTP requests, which makes it possible to identify invariable keywords based on the same malicious infrastructures and to generate more accurate templates. The results of implementing our system and validating it using real traffic data indicate that it reduced false positives by up to two-thirds compared to the conventional system and even increased the detection rate of infected hosts.

  • MineSpider: Extracting Hidden URLs Behind Evasive Drive-by Download Attacks

    Yuta TAKATA  Mitsuaki AKIYAMA  Takeshi YAGI  Takeo HARIU  Shigeki GOTO  

     
    PAPER-Web security

      Pubricized:
    2016/01/13
      Vol:
    E99-D No:4
      Page(s):
    860-872

    Drive-by download attacks force users to automatically download and install malware by redirecting them to malicious URLs that exploit vulnerabilities of the user's web browser. In addition, several evasion techniques, such as code obfuscation and environment-dependent redirection, are used in combination with drive-by download attacks to prevent detection. In environment-dependent redirection, attackers profile the information on the user's environment, such as the name and version of the browser and browser plugins, and launch a drive-by download attack on only certain targets by changing the destination URL. When malicious content detection and collection techniques, such as honeyclients, are used that do not match the specific environment of the attack target, they cannot detect the attack because they are not redirected. Therefore, it is necessary to improve analysis coverage while countering these adversarial evasion techniques. We propose a method for exhaustively analyzing JavaScript code relevant to redirections and extracting the destination URLs in the code. Our method facilitates the detection of attacks by extracting a large number of URLs while controlling the analysis overhead by excluding code not relevant to redirections. We implemented our method in a browser emulator called MINESPIDER that automatically extracts potential URLs from websites. We validated it by using communication data with malicious websites captured during a three-year period. The experimental results demonstrated that MINESPIDER extracted 30,000 new URLs from malicious websites in a few seconds that conventional methods missed.

  • Efficient FWM Based Broadband Wavelength Conversion Using a Short High-Nonlinearity Fiber

    Osamu ASO  Shin-ichi ARAI  Takeshi YAGI  Masateru TADAKUMA  Yoshihisa SUZUKI  Shu NAMIKI  

     
    PAPER-Fibers

      Vol:
    E83-C No:6
      Page(s):
    816-823

    Fiber four-wave mixing (FWM) based parametric wavelength conversion experiment is demonstrated. Over 91nm multi-channel simultaneous conversion is achieved. The bandwidth is to our knowledge, the broadest value of the published results. We shall argue that the method to realize the broadband wavelength conversion. Efficiency and/or bandwidth of the wavelength conversion is degraded mainly by the following obstacles, (a) inhomogeneity of the chromatic dispersion distribution along the fiber, (b) mismatch of the states of polarization (SOP) between pump and signals and (c) bandwidth limitation from coherence length. We discuss that an extremely short high-nonlinear fiber should overcome the above three obstacles. Furthermore we comment on the higher-order dispersion and also the influence of the stimulated Brillouin scattering (SBS). High-nonlinearity dispersion-shifted fiber (HNL-DSF) is a promising solution to generate the FWM efficiently in spite of the short length usage. We develop and fabricate HNL-DSF by the vapor-phase axial deposition method. Nonlinear coefficient of the fiber is 13.8 W-1km-1. We measure the conversion efficiency spectra of the four HNL-DSFs with different lengths. Length of each fiber is 24.5 km, 1.2 km, 200 m and 100 m respectively. It is shown that conversion bandwidth increases monotonically as the fiber length decreases. The result apparently proves the advantage of the extremely short fiber.

  • Analysis of Blacklist Update Frequency for Countering Malware Attacks on Websites

    Takeshi YAGI  Junichi MURAYAMA  Takeo HARIU  Sho TSUGAWA  Hiroyuki OHSAKI  Masayuki MURATA  

     
    PAPER-Internet

      Vol:
    E97-B No:1
      Page(s):
    76-86

    We proposes a method for determining the frequency for monitoring the activities of a malware download site used for malware attacks on websites. In recent years, there has been an increase in attacks exploiting vulnerabilities in web applications for infecting websites with malware and maliciously using those websites as attack platforms. One scheme for countering such attacks is to blacklist malware download sites and filter out access to them from user websites. However, a malware download site is often constructed through the use of an ordinary website that has been maliciously manipulated by an attacker. Once the malware has been deleted from the malware download site, this scheme must be able to unblacklist that site to prevent normal user websites from being falsely detected as malware download sites. However, if a malware download site is frequently monitored for the presence of malware, the attacker may sense this monitoring and relocate that malware on a different site. This means that an attack will not be detected until the newly generated malware download site is discovered. In response to these problems, we clarify the change in attack-detection accuracy caused by attacker behavior. This is done by modeling attacker behavior, specifying a state-transition model with respect to the blacklisting of a malware download site, and analyzing these models with synthetically generated attack patterns and measured attack patterns in an operation network. From this analysis, we derive the optimal monitoring frequency that maximizes the true detection rate while minimizing the false detection rate.

  • Efficient Dynamic Malware Analysis for Collecting HTTP Requests using Deep Learning

    Toshiki SHIBAHARA  Takeshi YAGI  Mitsuaki AKIYAMA  Daiki CHIBA  Kunio HATO  

     
    PAPER

      Pubricized:
    2019/02/01
      Vol:
    E102-D No:4
      Page(s):
    725-736

    Malware-infected hosts have typically been detected using network-based Intrusion Detection Systems on the basis of characteristic patterns of HTTP requests collected with dynamic malware analysis. Since attackers continuously modify malicious HTTP requests to evade detection, novel HTTP requests sent from new malware samples need to be exhaustively collected in order to maintain a high detection rate. However, analyzing all new malware samples for a long period is infeasible in a limited amount of time. Therefore, we propose a system for efficiently collecting HTTP requests with dynamic malware analysis. Specifically, our system analyzes a malware sample for a short period and then determines whether the analysis should be continued or suspended. Our system identifies malware samples whose analyses should be continued on the basis of the network behavior in their short-period analyses. To make an accurate determination, we focus on the fact that malware communications resemble natural language from the viewpoint of data structure. We apply the recursive neural network, which has recently exhibited high classification performance in the field of natural language processing, to our proposed system. In the evaluation with 42,856 malware samples, our proposed system collected 94% of novel HTTP requests and reduced analysis time by 82% in comparison with the system that continues all analyses.

  • Development of Low Loss Ultra-High Δ ZrO2-SiO2 PLC for Next Generation Compact and High-Density Integrated Devices Open Access

    Masanori TAKAHASHI  Yasuyoshi UCHIDA  Shintaro YAMASAKI  Junichi HASEGAWA  Takeshi YAGI  

     
    INVITED PAPER

      Vol:
    E97-C No:7
      Page(s):
    725-730

    For next generation planar lightwave circuit (PLC) devices, high function and high-density integration are required as well as downsizing and cost reduction. To realize these needs, high refractive index difference between a core and a clad $(Delta)$ is required. To use PLC for practical applications, silica-based PLC is one of the most attractive candidate. However, degradation of the optical properties and productivity occur when $Delta$ of the core becomes high. Thus, $Delta$ of most of the conventional PLC with GeO$_2$-SiO$_2$ core is designed less than 2.5%. In this paper, we report a silica-based ultra-high $Delta $ PLC with ZrO$_2$-SiO$_2$ core. 5.5%-$Delta$ ZrO$_2$-SiO$_2$ PLC has been realized with low propagation loss and basic characteristics has been confirmed. Potential of chip size reduction of the ZrO$_2$-SiO$_2$ PLC is shown.

  • Intelligent High-Interaction Web Honeypots Based on URL Conversion Scheme

    Takeshi YAGI  Naoto TANIMOTO  Takeo HARIU  Mitsutaka ITOH  

     
    PAPER-Internet

      Vol:
    E94-B No:5
      Page(s):
    1339-1347

    Vulnerabilities in web applications expose computer networks to security threats. For example, attackers use a large number of normal user websites as hopping sites, which are illegally operated using malware distributed by abusing vulnerabilities in web applications on these websites, for attacking other websites and user terminals. Thus, the security threats, resulting from vulnerabilities in web applications prevent service providers from constructing secure networking environments. To protect websites from attacks based on the vulnerabilities of web applications, security vendors and service providers collect attack information using web honeypots, which masquerade as vulnerable systems. To collect all accesses resulting from attacks that include further network attacks by malware, such as downloaders, vendors and providers use high-interaction web honeypots, which are composed of vulnerable systems with surveillance functions. However, conventional high-interaction web honeypots can collect only limited information and malware from attacks, whose paths in the destination URLs do not match the path structure of the web honeypot since these attacks are failures. To solve this problem, we propose a scheme in which the destination URLs of these attacks are corrected by determining the correct path from the path structure of the web honeypot. Our Internet investigation revealed that 97% of attacks are failures. However, we confirmed that approximately 50% of these attacks will succeed with our proposed scheme. We can use much more information with this scheme to protect websites than with conventional high-interaction web honeypots because we can collect complete information and malware from these attacks.

  • Event De-Noising Convolutional Neural Network for Detecting Malicious URL Sequences from Proxy Logs

    Toshiki SHIBAHARA  Kohei YAMANISHI  Yuta TAKATA  Daiki CHIBA  Taiga HOKAGUCHI  Mitsuaki AKIYAMA  Takeshi YAGI  Yuichi OHSITA  Masayuki MURATA  

     
    PAPER-Cryptography and Information Security

      Vol:
    E101-A No:12
      Page(s):
    2149-2161

    The number of infected hosts on enterprise networks has been increased by drive-by download attacks. In these attacks, users of compromised popular websites are redirected toward websites that exploit vulnerabilities of a browser and its plugins. To prevent damage, detection of infected hosts on the basis of proxy logs rather than blacklist-based filtering has started to be researched. This is because blacklists have become difficult to create due to the short lifetime of malicious domains and concealment of exploit code. To detect accesses to malicious websites from proxy logs, we propose a system for detecting malicious URL sequences on the basis of three key ideas: focusing on sequences of URLs that include artifacts of malicious redirections, designing new features related to software other than browsers, and generating new training data with data augmentation. To find an effective approach for classifying URL sequences, we compared three approaches: an individual-based approach, a convolutional neural network (CNN), and our new event de-noising CNN (EDCNN). Our EDCNN reduces the negative effects of benign URLs redirected from compromised websites included in malicious URL sequences. Evaluation results show that only our EDCNN with proposed features and data augmentation achieved a practical classification performance: a true positive rate of 99.1%, and a false positive rate of 3.4%.