The search functionality is under construction.

Author Search Result

[Author] Masato UCHIDA(16hit)

1-16hit
  • Access Load Balancing with Analogy to Thermal Diffusion for Dynamic P2P File-Sharing Environments

    Masanori TAKAOKA  Masato UCHIDA  Kei OHNISHI  Yuji OIE  

     
    PAPER

      Vol:
    E93-B No:5
      Page(s):
    1140-1150

    In this paper, we propose a file replication method to achieve load balancing in terms of write access to storage device ("write storage access load balancing" for short) in unstructured peer-to-peer (P2P) file-sharing networks in which the popularity trend of queried files varies dynamically. The proposed method uses a write storage access ratio as a load balance index value in order to stabilize dynamic P2P file-sharing environments adaptively. In the proposed method, each peer autonomously controls the file replication ratio, which is defined as a probability to create the replica of the file in order to uniform write storage access loads in the similar way to thermal diffusion phenomena. Theoretical analysis results show that the behavior of the proposed method actually has an analogy to a thermal diffusion equation. In addition, simulation results reveal that the proposed method has an ability to realize write storage access load balancing in the dynamic P2P file-sharing environments.

  • Identifying Heavy-Hitter Flows from Sampled Flow Statistics Open Access

    Tatsuya MORI  Tetsuya TAKINE  Jianping PAN  Ryoichi KAWAHARA  Masato UCHIDA  Shigeki GOTO  

     
    PAPER

      Vol:
    E90-B No:11
      Page(s):
    3061-3072

    With the rapid increase of link speed in recent years, packet sampling has become a very attractive and scalable means in collecting flow statistics; however, it also makes inferring original flow characteristics much more difficult. In this paper, we develop techniques and schemes to identify flows with a very large number of packets (also known as heavy-hitter flows) from sampled flow statistics. Our approach follows a two-stage strategy: We first parametrically estimate the original flow length distribution from sampled flows. We then identify heavy-hitter flows with Bayes' theorem, where the flow length distribution estimated at the first stage is used as an a priori distribution. Our approach is validated and evaluated with publicly available packet traces. We show that our approach provides a very flexible framework in striking an appropriate balance between false positives and false negatives when sampling frequency is given.

  • Modeling User Behavior in P2P Data Storage System

    Masato UCHIDA  Hideaki IIDUKA  Isao SUGINO  

     
    PAPER

      Vol:
    E98-B No:1
      Page(s):
    33-41

    In recent years, there has been growing interest in systems for sharing resources, which were originally used for personal purposes by individual users, among many unspecified users via a network. An example of such systems is a peer-to-peer (P2P) data storage system that enables users to share a portion of unused space in their own storage devices among themselves. In a recent paper on a P2P data storage system, the user behavior model was defined based on supply and demand functions that depend only on the storage space unit price in a virtual marketplace. However, it was implicitly assumed that other factors, such as unused space of storage devices possessed by users and additional storage space asked by users, did not affect the characteristics of the supply and demand functions. In addition, it was not clear how the values of parameters used in the user behavior model were determined. Therefore, in this paper, we modify the supply and demand functions and determine the values of their parameters by taking the above mentioned factors as well as the price structure of storage devices in a real marketplace into account. Moreover, we provide a numerical example to evaluate the social welfare realized by the P2P data storage system as a typical application of the modified supply and demand functions.

  • Node Degree Based Routing Metric for Traffic Load Distribution in the Internet

    Jun'ichi SHIMADA  Hitomi TAMURA  Masato UCHIDA  Yuji OIE  

     
    PAPER

      Vol:
    E96-D No:2
      Page(s):
    202-212

    Congestion inherently occurs on the Internet due to traffic concentration on certain nodes or links of networks. The traffic concentration is caused by inefficient use of topological information of networks in existing routing protocols, which reduces to inefficient mapping between traffic demands and network resources. Actually, the route with minimum cost, i.e., number of hops, selected as a transmission route by existing routing protocols would pass through specific nodes with common topological characteristics that could contribute to a large improvement in minimizing the cost. However, this would result in traffic concentration on such specific nodes. Therefore, we propose a measure of the distance between two nodes that is suitable for reducing traffic concentration on specific nodes. To consider the topological characteristics of the congestion points of networks, we define node-to-node distance by using a generalized norm, p-norm, of a vector of which elements are degrees of intermediate nodes of the route. Simulation results show that both the maximum Stress Centrality (SC) and the coefficient of variation of the SC are minimized in some network topologies by selecting transmission routes based on the proposed measure of node-to-node distance.

  • QoS-Aware Overlay Routing with Limited Number of Alternative Route Candidates and Its Evaluation

    Masato UCHIDA  Satoshi KAMEI  Ryoichi KAWAHARA  Takeo ABE  

     
    PAPER

      Vol:
    E89-B No:9
      Page(s):
    2361-2374

    A recent trend in routing research is the use of overlay routing to improve end-to-end QoS without changing the network-level architecture. The key of this technology is to find an alternative route that can avoid congested routes, using an overlay network. Developing cost-efficient overlay routing in terms of calculation cost and information distribution cost needed to find an alternative route is important for deploying QoS-aware overlay routing. Thus, this paper evaluates how effective overlay routing can be when the number of alternative route candidates is limited to reduce costs. Evaluation results using actual measurement data indicate that overlay routing is still effective even if alternative route candidates are limited to 1/4 of all possible alternative routes. We also discuss an overlay routing algorithm to enable us to find an appropriate route under the constraint that the number of alternative route candidates is limited.

  • Time-Series Measurement of Parked Domain Names and Their Malicious Uses

    Takayuki TOMATSURI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2021/01/08
      Vol:
    E104-B No:7
      Page(s):
    770-780

    On the Internet, there are lots of unused domain names that are not used for any actual services. Domain parking is a monetization mechanism for displaying online advertisements in such unused domain names. Some domain names used in cyber attacks are known to leverage domain parking services after the attack. However, the temporal relationships between domain parking services and malicious domain names have not been studied well. In this study, we investigated how malicious domain names using domain parking services change over time. We conducted a large-scale measurement study of more than 66.8 million domain names that have used domain parking services in the past 19 months. We reveal the existence of 3,964 domain names that have been malicious after using domain parking. We further identify what types of malicious activities (e.g., phishing and malware) such malicious domain names tend to be used for. We also reveal the existence of 3.02 million domain names that utilized multiple parking services simultaneously or while switching between them. Our study can contribute to the efficient analysis of malicious domain names using domain parking services.

  • TCP Flow Level Performance Evaluation on Error Rate Aware Scheduling Algorithms in Evolved UTRA and UTRAN Networks

    Yan ZHANG  Masato UCHIDA  Masato TSURU  Yuji OIE  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E91-B No:3
      Page(s):
    761-771

    We present a TCP flow level performance evaluation on error rate aware scheduling algorithms in Evolved UTRA and UTRAN networks. With the introduction of the error rate, which is the probability of transmission failure under a given wireless condition and the instantaneous transmission rate, the transmission efficiency can be improved without sacrificing the balance between system performance and user fairness. The performance comparison with and without error rate awareness is carried out dependant on various TCP traffic models, user channel conditions, schedulers with different fairness constraints, and automatic repeat request (ARQ) types. The results indicate that error rate awareness can make the resource allocation more reasonable and effectively improve the system and individual performance, especially for poor channel condition users.

  • Unsupervised Weight Parameter Estimation for Exponential Mixture Distribution Based on Symmetric Kullback-Leibler Divergence

    Masato UCHIDA  

     
    LETTER-Information Theory

      Vol:
    E98-A No:11
      Page(s):
    2349-2353

    When there are multiple component predictors, it is promising to integrate them into one predictor for advanced reasoning. If each component predictor is given as a stochastic model in the form of probability distribution, an exponential mixture of the component probability distributions provides a good way to integrate them. However, weight parameters used in the exponential mixture model are difficult to estimate if there is no training samples for performance evaluation. As a suboptimal way to solve this problem, weight parameters may be estimated so that the exponential mixture model should be a balance point that is defined as an equilibrium point with respect to the distance from/to all component probability distributions. In this paper, we propose a weight parameter estimation method that represents this concept using a symmetric Kullback-Leibler divergence and generalize this method.

  • Hop-Value-Based Query-Packet Forwarding for Pure P2P

    Masato UCHIDA  Shinya NOGAMI  

     
    LETTER

      Vol:
    E88-B No:12
      Page(s):
    4517-4522

    In pure peer-to-peer (P2P) file sharing applications and protocols using a flooding-based query algorithm, a large number of control packets (query packets) are transmitted on the network to search for target files. This clearly leads to a degradation of communication quality on the network and terminals as the number of users of the application increases. To solve such problems, this paper proposes: (1) a unified framework to describe a wide variety of query algorithms for pure P2P and (2) a new query algorithm based on this framework. Our framework determines the number of destinations for query packets based on the hop value recorded in received query packets. Simulation results revealed that the proposed query algorithm can reduce the overhead in the flooding-based query algorithm and k-random walks without decreasing the success rate of retrieval regardless of the density of target files in the network.

  • Query-Trail-Mediated Cooperative Behaviors of Peers in Unstructured P2P File Sharing Networks

    Kei OHNISHI  Hiroshi YAMAMOTO  Masato UCHIDA  Yuji OIE  

     
    PAPER-Information Network

      Vol:
    E94-D No:10
      Page(s):
    1966-1980

    We propose two types of autonomic and distributed cooperative behaviors of peers for peer-to-peer (P2P) file-sharing networks. Cooperative behaviors of peers are mediated by query trails, and allows the exploration of better trade-off points between file search and storage load balancing performance. Query trails represent previous successful search paths and indicate which peers contributed to previous file searches and were at the same time exposed to the storage load. The first type of cooperative behavior is to determine the locations of replicas of files through the medium of query trails. Placement of replicas of files on strong query trails contributes to improvement of search performance, but a heavy load is generated due to writing files in storage to peers on the strong query trails. Therefore, we attempt to achieve storage load balancing between peers, while avoiding significant degradation of the search performance by creating replicas of files in peers adjacent to peers on strong query trails. The second type of cooperative behavior is to determine whether peers provide requested files through the medium of query trails. Provision of files by peers holding requested files on strong query trails contributes to better search performance, but such provision of files generates a heavy load for reading files from storage to peers on the strong query trails. Therefore, we attempt to achieve storage load balancing while making only small sacrifices in search performance by having peers on strong query trails refuse to provide files. Simulation results show that the first type of cooperative behavior provides equal or improved ability to explore trade-off points between storage load balancing and search performance in a static and nearly homogeneous P2P environment, without the need for fine tuning parameter values, compared to replication methods that require fine tuning of their parameters values. In addition, the combination of the second type and the first type of cooperative behavior yields better storage load balancing performance with little degradation of search performance. Moreover, even in a dynamic and heterogeneous P2P environment, the two types of cooperative behaviors yield good ability to explore trade-off points between storage load balancing and search performance.

  • Dynamic and Decentralized Storage Load Balancing with Analogy to Thermal Diffusion for P2P File Sharing

    Masato UCHIDA  Kei OHNISHI  Kento ICHIKAWA  Masato TSURU  Yuji OIE  

     
    PAPER

      Vol:
    E93-B No:3
      Page(s):
    525-535

    In this paper we propose a file replication scheme inspired by a thermal diffusion phenomenon for storage load balancing in unstructured peer-to-peer (P2P) file sharing networks. The proposed scheme is designed such that the storage utilization ratios of peers will be uniform, in the same way that the temperature in a field becomes uniform in a thermal diffusion phenomenon. The proposed scheme creates replicas of files in peers probabilistically, where the probability is controlled by using parameters that can be used to find the trade-off between storage load balancing and search performance in unstructured P2P file sharing networks. First, we show through theoretical analysis that the statistical behavior of the storage load balancing controlled by the proposed scheme has an analogy with the thermal diffusion phenomenon. We then show through simulation that the proposed scheme not only has superior performance with respect to balancing the storage load among peers (the primary objective of the present proposal) but also allows the performance trade-off to be widely found. Finally, we qualitatively discuss a guideline for setting the parameter values in order to widely find the performance trade-off from the simulation results.

  • Unsupervised Ensemble Anomaly Detection Using Time-Periodic Packet Sampling

    Masato UCHIDA  Shuichi NAWATA  Yu GU  Masato TSURU  Yuji OIE  

     
    PAPER-Network Management/Operation

      Vol:
    E95-B No:7
      Page(s):
    2358-2367

    We propose an anomaly detection method for finding patterns in network traffic that do not conform to legitimate (i.e., normal) behavior. The proposed method trains a baseline model describing the normal behavior of network traffic without using manually labeled traffic data. The trained baseline model is used as the basis for comparison with the audit network traffic. This anomaly detection works in an unsupervised manner through the use of time-periodic packet sampling, which is used in a manner that differs from its intended purpose – the lossy nature of packet sampling is used to extract normal packets from the unlabeled original traffic data. Evaluation using actual traffic traces showed that the proposed method has false positive and false negative rates in the detection of anomalies regarding TCP SYN packets comparable to those of a conventional method that uses manually labeled traffic data to train the baseline model. Performance variation due to the probabilistic nature of sampled traffic data is mitigated by using ensemble anomaly detection that collectively exploits multiple baseline models in parallel. Alarm sensitivity is adjusted for the intended use by using maximum- and minimum-based anomaly detection that effectively take advantage of the performance variations among the multiple baseline models. Testing using actual traffic traces showed that the proposed anomaly detection method performs as well as one using manually labeled traffic data and better than one using randomly sampled (unlabeled) traffic data.

  • Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

    Naoki FUKUSHI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2019/10/08
      Vol:
    E103-B No:4
      Page(s):
    375-388

    In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.

  • Traffic Data Analysis Based on Extreme Value Theory and Its Applications to Predicting Unknown Serious Deterioration

    Masato UCHIDA  

     
    PAPER-Traffic Measurement and Analysis

      Vol:
    E87-D No:12
      Page(s):
    2654-2664

    It is important to predict serious deterioration of telecommunication quality. This paper investigates predicting such serious events by analyzing only a "short" period (i.e., a "small" amount) of teletraffic data. To achieve this end, this paper presents a method for analyzing the tail distributions of teletraffic state variables, because tail distributions are suitable for representing serious events. This method is based on Extreme Value Theory (EVT), which provides a firm theoretical foundation for the analysis. To be more precise, in this paper, we use throughput data measured on an actual network during daily busy hours for 15 minutes, and use its first 10 seconds (known data) to analyze the tail distribution. Then, we evaluate how well the obtained tail distribution can predict the tail distribution of the remaining 890 seconds (unknown data). The results indicate that the obtained tail distribution based on EVT by analyzing the small amount of known data can predict the tail distribution of unknown data much better than methods based on empirical or log-normal distributions. Furthermore, we apply the obtained tail distribution to predict the peak throughput in unknown data. The results of this paper enable us to predict serious deterioration events with lower measurement cost.

  • Impact of Censoring on Estimation of Flow Duration Distribution and Its Mitigation Using Kaplan-Meier-Based Method

    Yuki SAKAI  Masato UCHIDA  Masato TSURU  Yuji OIE  

     
    LETTER-QoS and Quality Management

      Vol:
    E92-D No:10
      Page(s):
    1949-1952

    A basic and inevitable problem in estimating flow duration distribution arises from "censoring" (i.e., cutting off) the observed flow duration because of a finite measurement period. We extended the Kaplan-Meier method, which is used in the survival analysis field, and applied it to recover information on the flow duration distribution that was lost due to censoring. We show that the flow duration distribution from a short period of actual traffic data with censoring that was estimated using a Kaplan-Meier-based method can approximate well the flow duration distribution calculated from a sufficiently long period of actual traffic data.

  • Discrete Modeling of the Worm Spread with Random Scanning

    Masato UCHIDA  

     
    LETTER

      Vol:
    E95-B No:5
      Page(s):
    1575-1579

    In this paper, we derive a set of discrete time difference equations that models the spreading process of computer worms such as Code-Red and Slammer, which uses a common strategy called “random scanning” to spread through the Internet. We show that the derived set of discrete time difference equations has an exact relationship with the Kermack and McKendrick susceptible-infectious-removed (SIR) model, which is known as a standard continuous time model for worm spreading.