IEICE global.ieice.org Site

Author Search Result

[Author] Hiroyuki TAKIZAWA(15hit)

1-15hit

Vector Quantization Codebook Design Using the Law-of-the-Jungle Algorithm
Hiroyuki TAKIZAWA Taira NAKAJIMA Kentaro SANO Hiroaki KOBAYASHI Tadao NAKAMURA

PAPER-Image Processing, Image Pattern Recognition

Vol:
E86-D No:6
Page(s):
1068-1077
The equidistortion principle[1] has recently been proposed as a basic principle for design of an optimal vector quantization (VQ) codebook. The equidistortion principle adjusts all codebook vectors such that they have the same contribution to quantization error. This paper introduces a novel VQ codebook design algorithm based on the equidistortion principle. The proposed algorithm is a variant of the law-of-the-jungle algorithm (LOJ), which duplicates useful codebook vectors and removes useless vectors. Due to the LOJ mechanism, the proposed algorithm can establish the equidistortion condition without wasting learning steps. This is significantly effective in preventing performance degradation caused when initial states of codebook vectors are improper to find an optimal codebook. Therefore, even in the case of improper initialization, the proposed algorithm can achieve minimization of quantization error based on the equidistortion principle. Performance of the proposed algorithm is discussed through experimental results.
A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy
Jiaheng LIU Ryusuke EGAWA Hiroyuki TAKIZAWA

PAPER-Computer System

Pubricized:
2022/03/09
Vol:
E105-D No:6
Page(s):
1150-1163
As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.
A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats
Hang CUI Shoichi HIRASAWA Hiroaki KOBAYASHI Hiroyuki TAKIZAWA

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/06/13
Vol:
E101-D No:9
Page(s):
2307-2314
Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.
Kohonen Learning with a Mechanism, the Law of the Jungle, Capable of Dealing with Nonstationary Probability Distribution Functions
Taira NAKAJIMA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI Tadao NAKAMURA

PAPER-Bio-Cybernetics and Neurocomputing

Vol:
E81-D No:6
Page(s):
584-591
We present a mechanism, named the law of the jungle (LOJ), to improve the Kohonen learning. The LOJ is used to be an adaptive vector quantizer for approximating nonstationary probability distribution functions. In the LOJ mechanism, the probability that each node wins in a competition is dynamically estimated during the learning. By using the estimated win probability, "strong" nodes are increased through creating new nodes near the nodes, and "weak" nodes are decreased through deleting themselves. A pair of creation and deletion is treated as an atomic operation. Therefore, the nodes which cannot win the competition are transferred directly from the region where inputs almost never occur to the region where inputs often occur. This direct "jump" of weak nodes provides rapid convergence. Moreover, the LOJ requires neither time-decaying parameters nor a special periodic adaptation. From the above reasons, the LOJ is suitable for quick approximation of nonstationary probability distribution functions. In comparison with some other Kohonen learning networks through experiments, only the LOJ can follow nonstationary probability distributions except for under high-noise environments.
Acceleration Techniques for the Network Inversion Algorithm
Hiroyuki TAKIZAWA Taira NAKAJIMA Masaaki NISHI Hiroaki KOBAYASHI Tadao NAKAMURA

LETTER-Bio-Cybernetics and Neurocomputing

Vol:
E82-D No:2
Page(s):
508-511
We apply two acceleration techniques for the backpropagation algorithm to an iterative gradient descent algorithm called the network inversion algorithm. Experimental results show that these techniques are also quite effective to decrease the number of iterations required for the detection of input vectors on the classification boundary of a multilayer perceptron.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Vol:
E96-D No:9
Page(s):
2047-2054
Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.
Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access
Antoniette MONDIGO Tomohiro UENO Kentaro SANO Hiroyuki TAKIZAWA

PAPER-Applications

Pubricized:
2019/02/05
Vol:
E102-D No:5
Page(s):
1029-1036
Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform.
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER

Vol:
E98-C No:7
Page(s):
550-558
As energy consumption of cache memories increases, an energy-efficient cache management mechanism is required. While a dynamic cache resizing mechanism is one promising approach to the energy reduction of microprocessors, one problem is that its effect is limited by the existence of dead-on-fill blocks, which are not used until their evictions from the cache memory. To solve this problem, this paper proposes a cache management policy named FLEXII, which can reduce the number of dead-on-fill blocks and help dynamic cache resizing mechanisms further reduce the energy consumption of the cache memories.
A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning
Shoichi HIRASAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Software

Pubricized:
2015/09/15
Vol:
E98-D No:12
Page(s):
2178-2186
Automatic performance tuning of a practical application could be time-consuming and sometimes infeasible, because it often needs to evaluate the performances of a large number of code variants to find the best one. In this paper, hence, a light-weight rollback mechanism is proposed to evaluate each of code variants at a low cost. In the proposed mechanism, once one code variant of a target code block is executed, the execution state is rolled back to the previous state of not yet executing the block so as to repeatedly execute only the block to find the best code variant. It also has a feature of terminating a code variant whose execution time is longer than the shortest execution time so far. As a result, it can prevent executing the whole application many times and thus reduces the timing overhead of an auto-tuning process required for finding the best code variant.
Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems
Muhammad ALFIAN AMRIZAL Atsuya UNO Yukinori SATO Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-High performance computing

Pubricized:
2017/07/14
Vol:
E100-D No:12
Page(s):
2749-2760
Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.
MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications
Ye GAO Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Pubricized:
2014/08/22
Vol:
E97-D No:11
Page(s):
2835-2843
Vector processors have significant advantages for next generation multimedia applications (MMAs). One of the advantages is that vector processors can achieve high data transfer performance by using a high bandwidth memory sub-system, resulting in a high sustained computing performance. However, the high bandwidth memory sub-system usually leads to enormous costs in terms of chip area, power and energy consumption. These costs are too expensive for commodity computer systems, which are the main execution platform of MMAs. This paper proposes a new multi-banked cache memory for commodity computer systems called MVP-cache in order to expand the potential of vector architectures on MMAs. Unlike conventional multi-banked cache memories, which employ one tag array and one data array in a sub-cache, MVP-cache associates one tag array with multiple independent data arrays of small-sized cache lines. In this way, MVP-cache realizes less static power consumption on its tag arrays. MVP-cache can also achieve high efficiency on short vector data transfers because the flexibility of data transfers can be improved by independently controlling the data transfers of each data array.
Balanced Ternary Quantum Voltage Generator Based on Zero Crossing Shapiro Steps in Asymmetric Two-Junction SQUIDs
Masataka MORIYA Hiroyuki TAKIZAWA Yoshinao MIZUGAKI

BRIEF PAPER

Vol:
E96-C No:3
Page(s):
334-337
The three-bit balanced ternary quantum voltage generator was designed and tested. This voltage generator is based on zero-crossing Shapiro steps (ZCSSs) in asymmetric two-junction SQUID. ZCSSs were observed on the current-voltage curves, and maximum and minimum current of ZCSSs were almost same, respectively for the three bits. 27-step quantum voltages from -13Φ0f to +13 Φ0f were observed by combinations of inputs of bit1, bit2 and bit3.
A Topology Preserving Neural Network for Nonstationary Distributions
Taira NAKAJIMA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI Tadao NAKAMURA

LETTER-Bio-Cybernetics and Neurocomputing

Vol:
E82-D No:7
Page(s):
1131-1135
We propose a learning algorithm for self-organizing neural networks to form a topology preserving map from an input manifold whose topology may dynamically change. Experimental results show that the network using the proposed algorithm can rapidly adjust itself to represent the topology of nonstationary input distributions.
A Network Clustering Algorithm for Sybil-Attack Resisting
Ling XU Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER

Vol:
E94-D No:12
Page(s):
2345-2352
The social network model has been regarded as a promising mechanism to defend against Sybil attack. This model assumes that honest peers and Sybil peers are connected by only a small number of attack edges. Detection of the attack edges plays a key role in restraining the power of Sybil peers. In this paper, an attack-resisting, distributed algorithm, named Random walk and Social network model-based clustering (RSC), is proposed to detect the attack edges. In RSC, peers disseminate random walk packets to each other. For each edge, the number of times that the packets pass this edge reflects the betweenness of this edge. RSC observes that the betweennesses of attack edges are higher than those of the non-attack edges. In this way, the attack edges can be identified. To show the effectiveness of RSC, RSC is integrated into an existing social network model-based algorithm called SOHL. The results of simulations with real world social network datasets show that RSC remarkably improves the performance of SOHL.
An Active Learning Algorithm Based on Existing Training Data
Hiroyuki TAKIZAWA Taira NAKAJIMA Hiroaki KOBAYASHI Tadao NAKAMURA

PAPER-Biocybernetics, Neurocomputing

Vol:
E83-D No:1
Page(s):
90-99
A multilayer perceptron is usually considered a passive learner that only receives given training data. However, if a multilayer perceptron actively gathers training data that resolve its uncertainty about a problem being learnt, sufficiently accurate classification is attained with fewer training data. Recently, such active learning has been receiving an increasing interest. In this paper, we propose a novel active learning strategy. The strategy attempts to produce only useful training data for multilayer perceptrons to achieve accurate classification, and avoids generating redundant training data. Furthermore, the strategy attempts to avoid generating temporarily useful training data that will become redundant in the future. As a result, the strategy can allow multilayer perceptrons to achieve accurate classification with fewer training data. To demonstrate the performance of the strategy in comparison with other active learning strategies, we also propose an empirical active learning algorithm as an implementation of the strategy, which does not require expensive computations. Experimental results show that the proposed algorithm improves the classification accuracy of a multilayer perceptron with fewer training data than that for a conventional random selection algorithm that constructs a training data set without explicit strategies. Moreover, the algorithm outperforms typical active learning algorithms in the experiments. Those results show that the algorithm can construct an appropriate training data set at lower computational cost, because training data generation is usually costly. Accordingly, the algorithm proves the effectiveness of the strategy through the experiments. We also discuss some drawbacks of the algorithm.

Author Search Result

[Author] Hiroyuki TAKIZAWA(15hit)

Vector Quantization Codebook Design Using the Law-of-the-Jungle Algorithm

A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy

A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

Kohonen Learning with a Mechanism, the Law of the Jungle, Capable of Dealing with Nonstationary Probability Distribution Functions

Acceleration Techniques for the Network Inversion Algorithm

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access

FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms

A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning

Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications

Balanced Ternary Quantum Voltage Generator Based on Zero Crossing Shapiro Steps in Asymmetric Two-Junction SQUIDs

A Topology Preserving Neural Network for Nonstationary Distributions

A Network Clustering Algorithm for Sybil-Attack Resisting

An Active Learning Algorithm Based on Existing Training Data

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles