The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Kyoji UMEMURA(6hit)

1-6hit
  • Factor Controlled Hierarchical SOM Visualization for Large Set of Data

    Junan CHAKMA  Kyoji UMEMURA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1796-1803

    Self-organizing map is a widely used tool in high-dimensional data visualization. However, despite its benefits of plotting very high-dimensional data on a low-dimensional grid, browsing and understanding the meaning of a trained map turn to be a difficult task -- specially when number of nodes or the size of data increases. Though there are some well-known techniques to visualize SOMs, they mainly deals with cluster boundaries and they fail to consider raw information available in original data in browsing SOMs. In this paper, we propose our Factor controlled Hierarchical SOM that enables us select number of data to train and label a particular map based on a pre-defined factor and provides consistent hierarchical SOM browsing.

  • Analytical Modeling of Network Throughput Prediction on the Internet

    Chunghan LEE  Hirotake ABE  Toshio HIROTSU  Kyoji UMEMURA  

     
    PAPER-Network and Communication

      Vol:
    E95-D No:12
      Page(s):
    2870-2878

    Predicting network throughput is important for network-aware applications. Network throughput depends on a number of factors, and many throughput prediction methods have been proposed. However, many of these methods are suffering from the fact that a distribution of traffic fluctuation is unclear and the scale and the bandwidth of networks are rapidly increasing. Furthermore, virtual machines are used as platforms in many network research and services fields, and they can affect network measurement. A prediction method that uses pairs of differently sized connections has been proposed. This method, which we call connection pair, features a small probe transfer using the TCP that can be used to predict the throughput of a large data transfer. We focus on measurements, analyses, and modeling for precise prediction results. We first clarified that the actual throughput for the connection pair is non-linearly and monotonically changed with noise. Second, we built a previously proposed predictor using the same training data sets as for our proposed method, and it was unsuitable for considering the above characteristics. We propose a throughput prediction method based on the connection pair that uses ν-support vector regression and the polynomial kernel to deal with prediction models represented as a non-linear and continuous monotonic function. The prediction results of our method compared to those of the previous predictor are more accurate. Moreover, under an unstable network state, the drop in accuracy is also smaller than that of the previous predictor.

  • Optimal Local Dimension Analysis of Latent Semantic Indexing on Query Neighbor Space

    Yinghui XU  Kyoji UMEMURA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1762-1772

    In this paper, we present our investigation of Latent Semantic Indexing (LSI) on the local query regions for solving the computation restrictions of the LSI on the global information space. Through the experiments with different SVD dimensionality on the local query regions, the results show that low-dimensional LSI can achieve much better precision than VSM and similar precision to global LSI. Such small SVD factors indicate that there is an almost linear surface in the local query regions. The largest or the two largest singular vectors have the ability to capture such a linear surface and benefit the particular query. In spite of the fact that Local LSI analysis needs to perform the Singular Value Decomposition (SVD) computation for each query, the surprisingly small requirements of the SVD dimension resolve the computation restrictions of LSI for large scale IR tasks. Moreover, on the condition that several relevant sample documents are available, application of low dimensional LSI for these documents can obtain comparable precision with the Local RF in a different manner.

  • Determining Indexing Strings with Statistical Analysis

    Yoshiyuki TAKEDA  Kyoji UMEMURA  Eiko YAMAMOTO  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1781-1787

    Determining indexing strings is an important factor in information retrieval. Ideally, the strings should be words that represent documents or queries. Although any single word may be the first candidate for indexing strings for an English corpus, it may not be ideal due to the existence of compound nouns, which are often good indexing strings, and which often depend on the genre of the corpus used. The situation is even worse in Japanese or Chinese where the words are not separated by spaces. In this paper, we propose a method of determining indexing strings based on statistical analysis. The novel features of our method are to make the most of the statistical measure called "adaptation" and not to use language-dependent resources such as dictionaries and stop word lists. In evaluating our method using a Japanese test collection, we found that it actually improves the precision of information retrieval systems.

  • Unified Likelihood Ratio Estimation for High- to Zero-Frequency N-Grams

    Masato KIKUCHI  Kento KAWAKAMI  Kazuho WATANABE  Mitsuo YOSHIDA  Kyoji UMEMURA  

     
    PAPER-Mathematical Systems Science

      Pubricized:
    2021/02/08
      Vol:
    E104-A No:8
      Page(s):
    1059-1074

    Likelihood ratios (LRs), which are commonly used for probabilistic data processing, are often estimated based on the frequency counts of individual elements obtained from samples. In natural language processing, an element can be a continuous sequence of N items, called an N-gram, in which each item is a word, letter, etc. In this paper, we attempt to estimate LRs based on N-gram frequency information. A naive estimation approach that uses only N-gram frequencies is sensitive to low-frequency (rare) N-grams and not applicable to zero-frequency (unobserved) N-grams; these are known as the low- and zero-frequency problems, respectively. To address these problems, we propose a method for decomposing N-grams into item units and then applying their frequencies along with the original N-gram frequencies. Our method can obtain the estimates of unobserved N-grams by using the unit frequencies. Although using only unit frequencies ignores dependencies between items, our method takes advantage of the fact that certain items often co-occur in practice and therefore maintains their dependencies by using the relevant N-gram frequencies. We also introduce a regularization to achieve robust estimation for rare N-grams. Our experimental results demonstrate that our method is effective at solving both problems and can effectively control dependencies.

  • Traffic Anomaly Analysis and Characteristics on a Virtualized Network Testbed

    Chunghan LEE  Hirotake ABE  Toshio HIROTSU  Kyoji UMEMURA  

     
    PAPER

      Vol:
    E94-D No:12
      Page(s):
    2353-2361

    Network testbeds have been used for network measurement and experiments. In such testbeds, resources, such as CPU, memory, and I/O interfaces, are shared and virtualized to maximize node utility for many users. A few studies have investigated the impact of virtualization on precise network measurement and understood Internet traffic characteristics on virtualized testbeds. Although scheduling latency and heavy loads are reportedly affected in precise network measurement, no clear conditions or criteria have been established. Moreover, empirical-statistical criteria and methods that pick out anomalous cases for precise network experiments are required on userland because virtualization technology used in the provided testbeds is hardly replaceable. In this paper, we show that ‘oversize packet spacing’, which can be caused by CPU scheduling latency, is a major cause of throughput instability on a virtualized network testbed even when no significant changes occur in well-known network metrics. These are unusual anomalies on virtualized network environment. Empirical-statistical analysis results accord with results at previous work. If network throughput is decreased by the anomalies, we should carefully review measurement results. Our empirical approach enables anomalous cases to be identified. We present CPU availability as an important criterion for estimating the anomalies.