The search functionality is under construction.

Author Search Result

[Author] Hiroyuki TAKIZAWA(15hit)

1-15hit
  • Vector Quantization Codebook Design Using the Law-of-the-Jungle Algorithm

    Hiroyuki TAKIZAWA  Taira NAKAJIMA  Kentaro SANO  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:6
      Page(s):
    1068-1077

    The equidistortion principle[1] has recently been proposed as a basic principle for design of an optimal vector quantization (VQ) codebook. The equidistortion principle adjusts all codebook vectors such that they have the same contribution to quantization error. This paper introduces a novel VQ codebook design algorithm based on the equidistortion principle. The proposed algorithm is a variant of the law-of-the-jungle algorithm (LOJ), which duplicates useful codebook vectors and removes useless vectors. Due to the LOJ mechanism, the proposed algorithm can establish the equidistortion condition without wasting learning steps. This is significantly effective in preventing performance degradation caused when initial states of codebook vectors are improper to find an optimal codebook. Therefore, even in the case of improper initialization, the proposed algorithm can achieve minimization of quantization error based on the equidistortion principle. Performance of the proposed algorithm is discussed through experimental results.

  • A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy

    Jiaheng LIU  Ryusuke EGAWA  Hiroyuki TAKIZAWA  

     
    PAPER-Computer System

      Pubricized:
    2022/03/09
      Vol:
    E105-D No:6
      Page(s):
    1150-1163

    As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.

  • A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

    Hang CUI  Shoichi HIRASAWA  Hiroaki KOBAYASHI  Hiroyuki TAKIZAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/13
      Vol:
    E101-D No:9
      Page(s):
    2307-2314

    Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.

  • Kohonen Learning with a Mechanism, the Law of the Jungle, Capable of Dealing with Nonstationary Probability Distribution Functions

    Taira NAKAJIMA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Bio-Cybernetics and Neurocomputing

      Vol:
    E81-D No:6
      Page(s):
    584-591

    We present a mechanism, named the law of the jungle (LOJ), to improve the Kohonen learning. The LOJ is used to be an adaptive vector quantizer for approximating nonstationary probability distribution functions. In the LOJ mechanism, the probability that each node wins in a competition is dynamically estimated during the learning. By using the estimated win probability, "strong" nodes are increased through creating new nodes near the nodes, and "weak" nodes are decreased through deleting themselves. A pair of creation and deletion is treated as an atomic operation. Therefore, the nodes which cannot win the competition are transferred directly from the region where inputs almost never occur to the region where inputs often occur. This direct "jump" of weak nodes provides rapid convergence. Moreover, the LOJ requires neither time-decaying parameters nor a special periodic adaptation. From the above reasons, the LOJ is suitable for quick approximation of nonstationary probability distribution functions. In comparison with some other Kohonen learning networks through experiments, only the LOJ can follow nonstationary probability distributions except for under high-noise environments.

  • Acceleration Techniques for the Network Inversion Algorithm

    Hiroyuki TAKIZAWA  Taira NAKAJIMA  Masaaki NISHI  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    LETTER-Bio-Cybernetics and Neurocomputing

      Vol:
    E82-D No:2
      Page(s):
    508-511

    We apply two acceleration techniques for the backpropagation algorithm to an iterative gradient descent algorithm called the network inversion algorithm. Experimental results show that these techniques are also quite effective to decrease the number of iterations required for the detection of input vectors on the classification boundary of a multilayer perceptron.

  • A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Vol:
    E96-D No:9
      Page(s):
    2047-2054

    Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.

  • Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access

    Antoniette MONDIGO  Tomohiro UENO  Kentaro SANO  Hiroyuki TAKIZAWA  

     
    PAPER-Applications

      Pubricized:
    2019/02/05
      Vol:
    E102-D No:5
      Page(s):
    1029-1036

    Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform.

  • FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER

      Vol:
    E98-C No:7
      Page(s):
    550-558

    As energy consumption of cache memories increases, an energy-efficient cache management mechanism is required. While a dynamic cache resizing mechanism is one promising approach to the energy reduction of microprocessors, one problem is that its effect is limited by the existence of dead-on-fill blocks, which are not used until their evictions from the cache memory. To solve this problem, this paper proposes a cache management policy named FLEXII, which can reduce the number of dead-on-fill blocks and help dynamic cache resizing mechanisms further reduce the energy consumption of the cache memories.

  • A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning

    Shoichi HIRASAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Software

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2178-2186

    Automatic performance tuning of a practical application could be time-consuming and sometimes infeasible, because it often needs to evaluate the performances of a large number of code variants to find the best one. In this paper, hence, a light-weight rollback mechanism is proposed to evaluate each of code variants at a low cost. In the proposed mechanism, once one code variant of a target code block is executed, the execution state is rolled back to the previous state of not yet executing the block so as to repeatedly execute only the block to find the best code variant. It also has a feature of terminating a code variant whose execution time is longer than the shortest execution time so far. As a result, it can prevent executing the whole application many times and thus reduces the timing overhead of an auto-tuning process required for finding the best code variant.

  • Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

    Muhammad ALFIAN AMRIZAL  Atsuya UNO  Yukinori SATO  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-High performance computing

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2749-2760

    Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.

  • MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications

    Ye GAO  Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Pubricized:
    2014/08/22
      Vol:
    E97-D No:11
      Page(s):
    2835-2843

    Vector processors have significant advantages for next generation multimedia applications (MMAs). One of the advantages is that vector processors can achieve high data transfer performance by using a high bandwidth memory sub-system, resulting in a high sustained computing performance. However, the high bandwidth memory sub-system usually leads to enormous costs in terms of chip area, power and energy consumption. These costs are too expensive for commodity computer systems, which are the main execution platform of MMAs. This paper proposes a new multi-banked cache memory for commodity computer systems called MVP-cache in order to expand the potential of vector architectures on MMAs. Unlike conventional multi-banked cache memories, which employ one tag array and one data array in a sub-cache, MVP-cache associates one tag array with multiple independent data arrays of small-sized cache lines. In this way, MVP-cache realizes less static power consumption on its tag arrays. MVP-cache can also achieve high efficiency on short vector data transfers because the flexibility of data transfers can be improved by independently controlling the data transfers of each data array.

  • Balanced Ternary Quantum Voltage Generator Based on Zero Crossing Shapiro Steps in Asymmetric Two-Junction SQUIDs

    Masataka MORIYA  Hiroyuki TAKIZAWA  Yoshinao MIZUGAKI  

     
    BRIEF PAPER

      Vol:
    E96-C No:3
      Page(s):
    334-337

    The three-bit balanced ternary quantum voltage generator was designed and tested. This voltage generator is based on zero-crossing Shapiro steps (ZCSSs) in asymmetric two-junction SQUID. ZCSSs were observed on the current-voltage curves, and maximum and minimum current of ZCSSs were almost same, respectively for the three bits. 27-step quantum voltages from -13Φ0f to +13 Φ0f were observed by combinations of inputs of bit1, bit2 and bit3.

  • A Topology Preserving Neural Network for Nonstationary Distributions

    Taira NAKAJIMA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    LETTER-Bio-Cybernetics and Neurocomputing

      Vol:
    E82-D No:7
      Page(s):
    1131-1135

    We propose a learning algorithm for self-organizing neural networks to form a topology preserving map from an input manifold whose topology may dynamically change. Experimental results show that the network using the proposed algorithm can rapidly adjust itself to represent the topology of nonstationary input distributions.

  • A Network Clustering Algorithm for Sybil-Attack Resisting

    Ling XU  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER

      Vol:
    E94-D No:12
      Page(s):
    2345-2352

    The social network model has been regarded as a promising mechanism to defend against Sybil attack. This model assumes that honest peers and Sybil peers are connected by only a small number of attack edges. Detection of the attack edges plays a key role in restraining the power of Sybil peers. In this paper, an attack-resisting, distributed algorithm, named Random walk and Social network model-based clustering (RSC), is proposed to detect the attack edges. In RSC, peers disseminate random walk packets to each other. For each edge, the number of times that the packets pass this edge reflects the betweenness of this edge. RSC observes that the betweennesses of attack edges are higher than those of the non-attack edges. In this way, the attack edges can be identified. To show the effectiveness of RSC, RSC is integrated into an existing social network model-based algorithm called SOHL. The results of simulations with real world social network datasets show that RSC remarkably improves the performance of SOHL.

  • An Active Learning Algorithm Based on Existing Training Data

    Hiroyuki TAKIZAWA  Taira NAKAJIMA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E83-D No:1
      Page(s):
    90-99

    A multilayer perceptron is usually considered a passive learner that only receives given training data. However, if a multilayer perceptron actively gathers training data that resolve its uncertainty about a problem being learnt, sufficiently accurate classification is attained with fewer training data. Recently, such active learning has been receiving an increasing interest. In this paper, we propose a novel active learning strategy. The strategy attempts to produce only useful training data for multilayer perceptrons to achieve accurate classification, and avoids generating redundant training data. Furthermore, the strategy attempts to avoid generating temporarily useful training data that will become redundant in the future. As a result, the strategy can allow multilayer perceptrons to achieve accurate classification with fewer training data. To demonstrate the performance of the strategy in comparison with other active learning strategies, we also propose an empirical active learning algorithm as an implementation of the strategy, which does not require expensive computations. Experimental results show that the proposed algorithm improves the classification accuracy of a multilayer perceptron with fewer training data than that for a conventional random selection algorithm that constructs a training data set without explicit strategies. Moreover, the algorithm outperforms typical active learning algorithms in the experiments. Those results show that the algorithm can construct an appropriate training data set at lower computational cost, because training data generation is usually costly. Accordingly, the algorithm proves the effectiveness of the strategy through the experiments. We also discuss some drawbacks of the algorithm.