Daiki CHIBA Takeshi YAGI Mitsuaki AKIYAMA Kazufumi AOKI Takeo HARIU Shigeki GOTO
Ever-evolving malware makes it difficult to prevent it from infecting hosts. Botnets in particular are one of the most serious threats to cyber security, since they consist of a lot of malware-infected hosts. Many countermeasures against malware infection, such as generating network-based signatures or templates, have been investigated. Such templates are designed to introduce regular expressions to detect polymorphic attacks conducted by attackers. A potential problem with such templates, however, is that they sometimes falsely regard benign communications as malicious, resulting in false positives, due to an inherent aspect of regular expressions. Since the cost of responding to malware infection is quite high, the number of false positives should be kept to a minimum. Therefore, we propose a system to generate templates that cause fewer false positives than a conventional system in order to achieve more accurate detection of malware-infected hosts. We focused on the key idea that malicious infrastructures, such as malware samples or command and control, tend to be reused instead of created from scratch. Our research verifies this idea and proposes here a new system to profile the variability of substrings in HTTP requests, which makes it possible to identify invariable keywords based on the same malicious infrastructures and to generate more accurate templates. The results of implementing our system and validating it using real traffic data indicate that it reduced false positives by up to two-thirds compared to the conventional system and even increased the detection rate of infected hosts.
Heshmatollah KHOSRAVI Masaki FUKUSHIMA Shigeki GOTO
In the Internet, flow analysis and network monitoring have been studied by various methods. Some methods try to make TCP (Transport Control Protocol) traces more readable by showing them graphically. Others such as MRTG, NetScope, and NetFlow read the traffic counters of the routers and record the data for traffic engineering. Even if all of the above methods are useful, they are made only to perform a single task. This paper describes an improved TCP Protocol Machine, a multipurpose tool that can be used for flow analysis, intrusion detection and link congestion monitoring. It is developed based on a finite state machine (automaton). The machine separates the flows into two main groups. If a flow can be mapped to a set of input symbols of the automaton, it is valid, otherwise it is invalid. It can be observed that intruders' attacks are easily detected by the use of the protocol machine. Also link congestion can be monitored, by measuring the percentage of valid flows to the total number of flows. We demonstrate the capability of this tool through measurement and working examples.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeo HARIU Shigeki GOTO
Drive-by download attacks force users to automatically download and install malware by redirecting them to malicious URLs that exploit vulnerabilities of the user's web browser. In addition, several evasion techniques, such as code obfuscation and environment-dependent redirection, are used in combination with drive-by download attacks to prevent detection. In environment-dependent redirection, attackers profile the information on the user's environment, such as the name and version of the browser and browser plugins, and launch a drive-by download attack on only certain targets by changing the destination URL. When malicious content detection and collection techniques, such as honeyclients, are used that do not match the specific environment of the attack target, they cannot detect the attack because they are not redirected. Therefore, it is necessary to improve analysis coverage while countering these adversarial evasion techniques. We propose a method for exhaustively analyzing JavaScript code relevant to redirections and extracting the destination URLs in the code. Our method facilitates the detection of attacks by extracting a large number of URLs while controlling the analysis overhead by excluding code not relevant to redirections. We implemented our method in a browser emulator called MINESPIDER that automatically extracts potential URLs from websites. We validated it by using communication data with malicious websites captured during a three-year period. The experimental results demonstrated that MINESPIDER extracted 30,000 new URLs from malicious websites in a few seconds that conventional methods missed.
Daiki CHIBA Ayako AKIYAMA HASEGAWA Takashi KOIDE Yuta SAWABE Shigeki GOTO Mitsuaki AKIYAMA
Internationalized domain names (IDNs) are abused to create domain names that are visually similar to those of legitimate/popular brands. In this work, we systematize such domain names, which we call deceptive IDNs, and analyze the risks associated with them. In particular, we propose a new system called DomainScouter to detect various deceptive IDNs and calculate a deceptive IDN score, a new metric indicating the number of users that are likely to be misled by a deceptive IDN. We perform a comprehensive measurement study on the identified deceptive IDNs using over 4.4 million registered IDNs under 570 top-level domains (TLDs). The measurement results demonstrate that there are many previously unexplored deceptive IDNs targeting non-English brands or combining other domain squatting methods. Furthermore, we conduct online surveys to examine and highlight vulnerabilities in user perceptions when encountering such IDNs. Finally, we discuss the practical countermeasures that stakeholders can take against deceptive IDNs.
This paper proposes a new simple method for network measurement. It extracts 6-bit control flags of TCP (Transmission Control Protocol) packets. The idea is based on the unique feature of flag ratios which is discovered by our exhaustive search for the new indexes of network traffic. By the use of flag ratios, one can tell if the network is really congested. It is much simpler than the conventional network monitoring by a network analyzer. The well-known monitoring method is based on the utilization parameter of a communication circuit which ranges from 0% to 100%. One cannot tell the line is congested even if the factor is 100%. 100% means full utilization and does not give any further information. To calculate the real performance of the network, one should estimate the throughput or effective speed of each user. The estimation needs much calculation. Our new method tries to correlate ratios of TCP control flags and network congestion. The result shows the usefulness of this new method. This paper analyzes the reason why the flag ratios show the unique feature.
Midori ASAKA Masahiko TSUCHIYA Takefumi ONABUTA Shunji OKAZAWA Shigeki GOTO
At the Information-technology Promotion Agency (IPA), we have been developing a network intrusion detection system called IDA (Intrusion Detection Agent system). IDA system has two distinctive features that most conventional intrusion detection systems lack. First, it has a mechanism for tracing the origin of a break-in by means of mobile agents. Second, it has a new and efficient method of detecting intrusions: rather than continuously monitoring the user's activities, it watches for an event that meets the criteria of an MLSI (Mark Left by Suspected Intruders) and may relate to an intrusion. By this method, IDA described herein can reduce the processing overhead of systems and networks. At present, IDA can detect local attacks that are initiated against a machine to which the attacker already has access and he or she attempts to exceed his or her authority. This paper mainly describes how IDA detects local attacks and traces intrusions.
This paper proposes a name-based routing mechanism called Routing Guidance Name (RGN) that offers new routing management functionalities within the basic characteristics of CCN. The proposed mechanism names each CCN router. Each router becomes a Data Provider for its name. When a CCN Interest specifies a router's name, it is forwarded to the target router according to the standard mechanism of CCN. Upon receiving an Interest, each router reacts to it according to RGN. This paper introduces a new type of node called a Scheduler which calculates the best routes based on link state information collected from routers. The scheduler performs its functions based on RGN. This paper discusses how the proposed system builds CCN FIB (Forwarding Information Base) in routers. The results of experiments reveal that RGN is more efficient than the standard CCN scheme. It is also shown that the proposal provides mobility support with short delay time. We explain a practical mobile scenario to illustrate the advantages of the proposal.
Tatsuya MORI Tetsuya TAKINE Jianping PAN Ryoichi KAWAHARA Masato UCHIDA Shigeki GOTO
With the rapid increase of link speed in recent years, packet sampling has become a very attractive and scalable means in collecting flow statistics; however, it also makes inferring original flow characteristics much more difficult. In this paper, we develop techniques and schemes to identify flows with a very large number of packets (also known as heavy-hitter flows) from sampled flow statistics. Our approach follows a two-stage strategy: We first parametrically estimate the original flow length distribution from sampled flows. We then identify heavy-hitter flows with Bayes' theorem, where the flow length distribution estimated at the first stage is used as an a priori distribution. Our approach is validated and evaluated with publicly available packet traces. We show that our approach provides a very flexible framework in striking an appropriate balance between false positives and false negatives when sampling frequency is given.
Susumu SHIMIZU Kensuke FUKUDA Ken-ichiro MURAKAMI Shigeki GOTO
This paper proposes a new method of estimating real-time traffic matrices that only incurs small errors in estimation. A traffic matrix represents flows of traffic in a network. It is an essential tool for capacity planning and traffic engineering. However, the high costs involved in measurement make it difficult to assemble an accurate traffic matrix. It is therefore important to estimate a traffic matrix using limited information that only incurs small errors. Existing approaches have used IP-related information to reduce the estimation errors and computational complexity. In contrast, our method, called spike flow measurement (SFM) reduces errors and complexity by focusing on spikes. A spike is transient excessive usage of a communications link. Spikes are easily monitored through an SNMP framework. This reduces the measurement costs compared to that of other approaches. SFM identifies spike flows from traffic byte counts by detecting pairs of incoming and outgoing spikes in a network. A matrix is then constructed from collected spike flows as an approximation of the real traffic matrix. Our experimental evaluation reveals that the average error in estimation is 28%, which is sufficiently small for the method to be applied to a wide range of network nodes, including Ethernet switches and IP routers.
Midori ASAKA Takefumi ONABUTA Tadashi INOUE Shunji OKAZAWA Shigeki GOTO
Many methods have been proposed to detect intrusions; for example, the pattern matching method on known intrusion patterns and the statistical approach to detecting deviation from normal activities. We investigated a new method for detecting intrusions based on the number of system calls during a user's network activity on a host machine. This method attempts to separate intrusions from normal activities by using discriminant analysis, a kind of multivariate analysis. We can detect intrusions by analyzing only 11 system calls occurring on a host machine by discriminant analysis with the Mahalanobis' distance, and can also tell whether an unknown sample is an intrusion. Our approach is a lightweight intrusion detection method, given that it requires only 11 system calls for analysis. Moreover, our approach does not require user profiles or a user activity database in order to detect intrusions. This paper explains our new method for the separation of intrusions and normal behavior by discriminant analysis, and describes the classification method by which to identify an unknown behavior.
Midori ASAKA Takefumi ONABUTA Shigeki GOTO
The number of computer break-ins from the outside of an organization has increased with the rapid growth of the Internet. Since many intruders from the outside of an organization employ stepping stones, it is difficult to trace back where the real origin of the attack is. Some research projects have proposed tracing methods for DoS attacks and detecting method of stepping stones. It is still difficult to locate the origin of an attack that uses stepping stones. We have developed IDA (Intrusion Detection Agent system), which has an intrusion tracing mechanism in a LAN environment. In this paper, we improve the tracing mechanism so that it can trace back stepping stones attack in the Internet. In our method, the information about tracing stepping stone is collected from hosts in a LAN effectively, and the information is made available at the public information server. A pursuer of stepping stone attack can trace back the intrusion based on the information available at the public information server on an intrusion route.
Hongbo SHI Izuru SATO Shigeki GOTO
This paper proposes a new method of realizing internationlized domain names (iDN) and has been discussed at IETF (Internet Engineering Task Force). iDN allows a user to specify multi-lingual domain names, such as Japanese, Chinese, and Korean. iDN is a proper extension of the current domain name system. We have already developed an iDN implementation, named Global Domain Name System (GDNS). GDNS extends the usage of alias records, and gives reverse mapping information for multi-lingual domain names. This paper presents yet another method which introduces new Resource Record (RR) types to cover multi-lingual domain names. We have two new RR (Resource Record) types. The first new record is INAME and the other is IPTR. These two RR types can cover multi-lingual domain names. This paper also discusses the efficiency of DNS. Since DNS is a distributed database system, the performance depends on the method of retrieving data. This paper suggests a new retrieving method that can improve the performance of DNS remarkably.
Norihiro FUKUMOTO Shigehiro ANO Shigeki GOTO
Video traffic occupies a major part of current mobile traffic. The characteristics of video traffic are dominated by the behavior of the video application users. This paper uses a state transition diagram to analyze the behavior of video application users on smart phones. Video application users are divided into two categories; keyword search users and initial screen users. They take different first action in video viewing. The result of our analysis shows that the patience of video application users depends on whether they have a specific purpose when they launch a video application or not. Mobile network operators can improve the QoE of video application users by utilizing the results of this study.
Jun-ya KATO Atsuo SHIMIZU Shigeki GOTO
This paper proposes a new model which can approximate the delay time distribution in the Internet. It is well known that the delay time in communication links follows the exponential distribution. However, the earlier models cannot explain the distribution when a communication link is heavily overloaded. This paper proposes to use the M/M/S(m) model for the Internet. We have applied our model to the measurement results. This paper deals with one-way delay because it reflects the actual characteristics of communication links. Most measurement statistics in the Internet have been based on round-trip time delay between two end nodes. These characteristics are easily measured by sending sample packets from one node to the other. The receiver side echoes back the packets. However, the results are not always useful. A long distance communication link, such as a leased line, has two different fibers or wires for each direction: an incoming link, and an outgoing link. When the link is overloaded, the traffic in each link is quite different. The measurement of one-way delay is especially important for multimedia communications, because audio and video transmissions are essentially one-way traffic.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeshi YADA Shigeki GOTO
An incident response organization such as a CSIRT contributes to preventing the spread of malware infection by analyzing compromised websites and sending abuse reports with detected URLs to webmasters. However, these abuse reports with only URLs are not sufficient to clean up the websites. In addition, it is difficult to analyze malicious websites across different client environments because these websites change behavior depending on a client environment. To expedite compromised website clean-up, it is important to provide fine-grained information such as malicious URL relations, the precise position of compromised web content, and the target range of client environments. In this paper, we propose a new method of constructing a redirection graph with context, such as which web content redirects to malicious websites. The proposed method analyzes a website in a multi-client environment to identify which client environment is exposed to threats. We evaluated our system using crawling datasets of approximately 2,000 compromised websites. The result shows that our system successfully identified malicious URL relations and compromised web content, and the number of URLs and the amount of web content to be analyzed were sufficient for incident responders by 15.0% and 0.8%, respectively. Furthermore, it can also identify the target range of client environments in 30.4% of websites and a vulnerability that has been used in malicious websites by leveraging target information. This fine-grained analysis by our system would contribute to improving the daily work of incident responders.
The auction is a popular way of trading. Despite of the popularity of the auction, only a small number of papers have addressed the protocol which realize the double auction. In this paper, we propose a new method of double auction which improves the algorithm of the existing double auction protocol. Our new method is based on the idea of number comparison which is realized by homomorphic encryption. The new method solves the problem of the privacy of losing bids found in the existing algorithm. The buyers and the sellers can embed a random number in their bidding information by the use of the homomorphic encryption. The players in an auction cannot get anyone else's bidding information. The new method is more efficient than the existing ones. Our new method satisfies the criteria for the auction protocol.
This paper proposes a new method for realizing the web page recommendation system by sharing users' web browse history on an anonymous P2P network. Our scheme creates a user profile, a summary of the user's web browse trends, by analyzing the contents of the web pages browsed. The scheme then provides a P2P network to exchange web browse histories so as to create mutual web page recommendations. The novelty of our method lies in its P2P network formulation; it is formulated in a way so that users having similar user profiles are automatically connected, yet their user profiles are protected from being disclosed to other users. The proposed method intentionally distributes bogus user profiles on the P2P network, while not harming the efficiency of the web browse history sharing process.
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeo HARIU Kazuhiko OHKUBO Shigeki GOTO
Security researchers/vendors detect malicious websites based on several website features extracted by honeyclient analysis. However, web-based attacks continue to be more sophisticated along with the development of countermeasure techniques. Attackers detect the honeyclient and evade analysis using sophisticated JavaScript code. The evasive code indirectly identifies vulnerable clients by abusing the differences among JavaScript implementations. Attackers deliver malware only to targeted clients on the basis of the evasion results while avoiding honeyclient analysis. Therefore, we are faced with a problem in that honeyclients cannot analyze malicious websites. Nevertheless, we can observe the evasion nature, i.e., the results in accessing malicious websites by using targeted clients are different from those by using honeyclients. In this paper, we propose a method of extracting evasive code by leveraging the above differences to investigate current evasion techniques. Our method analyzes HTTP transactions of the same website obtained using two types of clients, a real browser as a targeted client and a browser emulator as a honeyclient. As a result of evaluating our method with 8,467 JavaScript samples executed in 20,272 malicious websites, we discovered previously unknown evasion techniques that abuse the differences among JavaScript implementations. These findings will contribute to improving the analysis capabilities of conventional honeyclients.
This paper presents a network surveillance technique for detecting malicious activities. Based on the hypothesis that unusual conducts like system exploitation will trigger an abnormal network pattern, we try to detect this anomalous network traffic pattern as a sign of malicious, or at least suspicious activities. Capturing and analyzing of a network traffic pattern is implemented with a concept of port profiling, where measures representing various characteristics of connections are monitored and recorded for each port. Though the generation of the port profiles requires the minimum calculation and memory, they exhibit high stability and robustness. Each port profile retains the patterns of the corresponding connections precisely, even if the connections demonstrate multi-modal characteristics. By comparing the pattern exhibited by live traffic with the expected behavior recorded in the profile, intrusive activities like compromising backdoors or invoking trojan programs are successfully detected.
Tatsuya MORI Ryoichi KAWAHARA Shozo NAITO Shigeki GOTO
Analysing and modeling of traffic play a vital role in designing and controlling of networks effectively. To construct a practical traffic model that can be used for various networks, it is necessary to characterize aggregated traffic and user traffic. This paper investigates these characteristics and their relationship. Our analyses are based on a huge number of packet traces from five different networks on the Internet. We found that: (1) marginal distributions of aggregated traffic fluctuations follow positively skewed (non-Gaussian) distributions, which leads to the existence of "spikes", where spikes correspond to an extremely large value of momentary throughput, (2) the amount of user traffic in a unit of time has a wide range of variability, and (3) flows within spikes are more likely to be "elephant flows", where an elephant flow is an IP flow with a high volume of traffic. These findings are useful in constructing a practical and realistic Internet traffic model.