The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Aki KOBAYASHI(60hit)

1-20hit(60hit)

  • A Low Power Multimedia Processor Implementing Dynamic Voltage and Frequency Scaling Technique and Fast Motion Estimation Algorithm Called “Adaptively Assigned Breaking-Off Condition (A2BC)”

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    424-432

    A motion estimation (ME) multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation. To make full use of the advantages of DVFS technique, a fast motion estimation (ME) algorithm was also developed. It can adaptively predict the optimum supply voltage and the optimum clock frequency before ME process starts for each macro-block for encoding. Power dissipation of the 90-nm CMOS DVFS controlled multimedia processor, which contained an absolute difference accumulator as well as a small on-chip DC/DC level converter, a minimum value detector and DVFS controller, was reduced to 38.48 µW, which was only 3.261% that of a conventional multimedia processor.

  • Complex-Valued Bipartite Auto-Associative Memory

    Yozo SUZUKI  Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E97-A No:8
      Page(s):
    1680-1687

    Complex-valued Hopfield associative memory (CHAM) is one of the most promising neural network models to deal with multilevel information. CHAM has an inherent property of rotational invariance. Rotational invariance is a factor that reduces a network's robustness to noise, which is a critical problem. Here, we proposed complex-valued bipartite auto-associative memory (CBAAM) to solve this reduction in noise robustness. CBAAM consists of two layers, a visible complex-valued layer and an invisible real-valued layer. The invisible real-valued layer prevents rotational invariance and the resulting reduction in noise robustness. In addition, CBAAM has high parallelism, unlike CHAM. By computer simulations, we show that CBAAM is superior to CHAM in noise robustness. The noise robustness of CHAM decreased as the resolution factor increased. On the other hand, CBAAM provided high noise robustness independent of the resolution factor.

  • Uniqueness Theorem of Complex-Valued Neural Networks with Polar-Represented Activation Function

    Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E98-A No:9
      Page(s):
    1937-1943

    Several models of feed-forward complex-valued neural networks have been proposed, and those with split and polar-represented activation functions have been mainly studied. Neural networks with split activation functions are relatively easy to analyze, but complex-valued neural networks with polar-represented functions have many applications but are difficult to analyze. In previous research, Nitta proved the uniqueness theorem of complex-valued neural networks with split activation functions. Subsequently, he studied their critical points, which caused plateaus and local minima in their learning processes. Thus, the uniqueness theorem is closely related to the learning process. In the present work, we first define three types of reducibility for feed-forward complex-valued neural networks with polar-represented activation functions and prove that we can easily transform reducible complex-valued neural networks into irreducible ones. We then prove the uniqueness theorem of complex-valued neural networks with polar-represented activation functions.

  • Hybrid Quaternionic Hopfield Neural Network

    Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E98-A No:7
      Page(s):
    1512-1518

    In recent years, applications of complex-valued neural networks have become wide spread. Quaternions are an extension of complex numbers, and neural networks with quaternions have been proposed. Because quaternion algebra is non-commutative algebra, we can consider two orders of multiplication to calculate weighted input. However, both orders provide almost the same performance. We propose hybrid quaternionic Hopfield neural networks, which have both orders of multiplication. Using computer simulations, we show that these networks outperformed conventional quaternionic Hopfield neural networks in noise tolerance. We discuss why hybrid quaternionic Hopfield neural networks improve noise tolerance from the standpoint of rotational invariance.

  • FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER

      Vol:
    E98-C No:7
      Page(s):
    550-558

    As energy consumption of cache memories increases, an energy-efficient cache management mechanism is required. While a dynamic cache resizing mechanism is one promising approach to the energy reduction of microprocessors, one problem is that its effect is limited by the existence of dead-on-fill blocks, which are not used until their evictions from the cache memory. To solve this problem, this paper proposes a cache management policy named FLEXII, which can reduce the number of dead-on-fill blocks and help dynamic cache resizing mechanisms further reduce the energy consumption of the cache memories.

  • An Adaptive Algorithm for Cascaded Notch Filter with Reduced Bias

    James OKELLO  Shin'ichi ARITA  Yoshio ITOH  Yutaka FUKUI  Masaki KOBAYASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E84-A No:2
      Page(s):
    589-596

    In this paper we propose a new simplified algorithm for cascaded second order adaptive notch filters implemented using an allpass filter, for elimination of multiple sinusoids. Each of the stages of the notch filter is implemented using direct form second order allpass filter. We also present an analysis which compares the proposed algorithm with the conventional simplified algorithm, and which indicates that the proposed algorithm has a reduced bias in the estimation of the multiple input sinusoids. Simulation results that have been provided confirm this analysis.

  • A Metadata Prefetching Mechanism for Hybrid Memory Architectures Open Access

    Shunsuke TSUKADA  Hikaru TAKAYASHIKI  Masayuki SATO  Kazuhiko KOMATSU  Hiroaki KOBAYASHI  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    232-243

    A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.

  • A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning

    Shoichi HIRASAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Software

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2178-2186

    Automatic performance tuning of a practical application could be time-consuming and sometimes infeasible, because it often needs to evaluate the performances of a large number of code variants to find the best one. In this paper, hence, a light-weight rollback mechanism is proposed to evaluate each of code variants at a low cost. In the proposed mechanism, once one code variant of a target code block is executed, the execution state is rolled back to the previous state of not yet executing the block so as to repeatedly execute only the block to find the best code variant. It also has a feature of terminating a code variant whose execution time is longer than the shortest execution time so far. As a result, it can prevent executing the whole application many times and thus reduces the timing overhead of an auto-tuning process required for finding the best code variant.

  • Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

    Muhammad ALFIAN AMRIZAL  Atsuya UNO  Yukinori SATO  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-High performance computing

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2749-2760

    Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.

  • Simplification of Liquid Dielectric Property Evaluation Based on Comparison with Reference Materials and Electromagnetic Analysis Using the Cut-Off Waveguide Reflection Method

    Kouji SHIBATA  Masaki KOBAYASHI  

     
    PAPER

      Vol:
    E100-C No:10
      Page(s):
    908-917

    In this study, expressions were compared with reference material using the coaxial feed-type open-ended cut-off circular waveguide reflection method to support simple and instantaneous evaluation of dielectric constants in small amounts of scarce liquids over a broad frequency range. S11 values were determined via electromagnetic analysis for individual jig structure conditions and dielectric property values without actual S11 measurement under the condition that the tip of the measurement jig with open and short-ended conditions and with the test material inserted. Next, information on the relationships linking jig structure, dielectric properties and S11 properties was stored on a database to simplify the procedure and improve accuracy in reference material evaluation. The accuracy of the estimation formula was first theoretically verified for cases in which values indicating the dielectric properties of the reference material and the actual material differed significantly to verify the effectiveness of the proposed method. The results indicated that dielectric property values for various liquids measured at 0.5 and 1.0GHz using the proposed method corresponded closely to those obtained using the method previously proposed by the authors. The effectiveness of the proposed method was evaluated by determining the dielectric properties of certain liquids at octave-range continuous frequencies between 0.5 and 1.0GHz based on interpolation from limited data of several frequencies. The results indicated that the approach enables quicker and easier measurement to establish the complex permittivity of liquids over a broad frequency range than the previous method.

  • Quantized Decoder Adaptively Predicting both Optimum Clock Frequency and Optimum Supply Voltage for a Dynamic Voltage and Frequency Scaling Controlled Multimedia Processor

    Nobuaki KOBAYASHI  Tadayoshi ENOMOTO  

     
    PAPER-Electronic Circuits

      Vol:
    E101-C No:8
      Page(s):
    671-679

    To completely utilize the advantages of dynamic voltage and frequency scaling (DVFS) techniques, a quantized decoder (QNT-D) was developed. The QNT-D generates a quantized signal processing quantity (Q) using a predicted signal processing quantity (M). Q is used to produce the optimum frequency (opt.fc) and the optimum supply voltage (opt.VD) that are proportional to Q. To develop a DVFS controlled motion estimation (ME) processor, we used both the QNT-D and a fast ME algorithm called A2BC (Adaptively Assigned Breaking-off Condition) to predict M for each macro-block (MB). A DVFS controlled ME processor was fabricated using 90-nm CMOS technology. The total power dissipation (PT) of the processor was significantly reduced and varied from 38.65 to 99.5 µW, only 3.27 to 8.41 % of PT of a conventional ME processor, depending on the test video picture.

  • Design of Digital Filters Simulating an LCR Filter with Node Equation

    Hazaoud AHMED  Etsuro HAYAHARA  Masaki KOBAYASHI  Yoshio ITOH  

     
    PAPER-Circuit Theory

      Vol:
    E68-E No:8
      Page(s):
    535-539

    This paper describes a digital filter realization method by simulating an LCR filter. Having the node equation of an original LCR filter the frequency variable s is transformed into a z one using the bilinear transformation. The resulting network equation can be digitally realized with the same transfer function as the original LCR filter. Using such a method, the circuit either has a large error in the transfer response near zero frequencies or causes oscillations. A technique to avoid this problem by a simple modification of the multiplier coefficients is shown. A fifth order elliptic filter is presented with illustrative comparison to classical cascade structure.

  • Bias Free Adaptive Notch Filter Based on Fourier Sine Series

    Kazuki SHIOGAI  Naoto SASAOKA  Masaki KOBAYASHI  Isao NAKANISHI  James OKELLO  Yoshio ITOH  

     
    PAPER-Digital Signal Processing

      Vol:
    E97-A No:2
      Page(s):
    557-564

    Conventional adaptive notch filter based on an infinite impulse response (IIR) filter is well known. However, this kind of adaptive notch filter has a problem of stability due to its adaptive IIR filter. In addition, tap coefficients of this notch filter converge to solutions with bias error. In order to solve these problems, an adaptive notch filter using Fourier sine series (ANFF) is proposed. The ANFF is stable because an adaptive IIR filter is not used as an all-pass filter. Further, the proposed adaptive notch filter is robust enough to overcome effects of a disturbance signal, due to a structure of the notch filter based on an exponential filter and line symmetry of auto correlation.

  • MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications

    Ye GAO  Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Pubricized:
    2014/08/22
      Vol:
    E97-D No:11
      Page(s):
    2835-2843

    Vector processors have significant advantages for next generation multimedia applications (MMAs). One of the advantages is that vector processors can achieve high data transfer performance by using a high bandwidth memory sub-system, resulting in a high sustained computing performance. However, the high bandwidth memory sub-system usually leads to enormous costs in terms of chip area, power and energy consumption. These costs are too expensive for commodity computer systems, which are the main execution platform of MMAs. This paper proposes a new multi-banked cache memory for commodity computer systems called MVP-cache in order to expand the potential of vector architectures on MMAs. Unlike conventional multi-banked cache memories, which employ one tag array and one data array in a sub-cache, MVP-cache associates one tag array with multiple independent data arrays of small-sized cache lines. In this way, MVP-cache realizes less static power consumption on its tag arrays. MVP-cache can also achieve high efficiency on short vector data transfers because the flexibility of data transfers can be improved by independently controlling the data transfers of each data array.

  • A Simple Algorithm for Adaptive Allpass-FIR Digital Filter Using Lattice Allpass Filter with Minimum Multipliers

    James OKELLO  Yoshio ITOH  Yutaka FUKUI  Masaki KOBAYASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E82-A No:1
      Page(s):
    138-144

    Adaptive infinite impulse response (IIR) digital filter implemented using a cascade of second order direct form allpass filters and a finite impulse response (FIR) filter, has the property of its poles converging to those of the unknown system. In this paper we implement the adaptive allpass-FIR digital filter using a lattice allpass filter with minimum number of multipliers. We then derive a simple adaptive algorithm, which does not increase the overall number of multipliers of the proposed adaptive digital filter (ADF) in comparison to the ADF that uses the direct form allpass filter. The proposed structure and algorithm exhibit a kind of orthogonality, which ensures convergence of the poles of the ADF to those of the unknown system. Simulation results confirm this convergence.

  • A Topology Preserving Neural Network for Nonstationary Distributions

    Taira NAKAJIMA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    LETTER-Bio-Cybernetics and Neurocomputing

      Vol:
    E82-D No:7
      Page(s):
    1131-1135

    We propose a learning algorithm for self-organizing neural networks to form a topology preserving map from an input manifold whose topology may dynamically change. Experimental results show that the network using the proposed algorithm can rapidly adjust itself to represent the topology of nonstationary input distributions.

  • An Efficient Reference Image Sharing Method for the Image-Division Parallel Video Encoding Architecture

    Ken NAKAMURA  Yuya OMORI  Daisuke KOBAYASHI  Koyo NITTA  Kimikazu SANO  Masayuki SATO  Hiroe IWASAKI  Hiroaki KOBAYASHI  

     
    PAPER

      Pubricized:
    2022/11/29
      Vol:
    E106-C No:6
      Page(s):
    312-320

    This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.

  • Single-Power-Supply Six-Transistor CMOS SRAM Enabling Low-Voltage Writing, Low-Voltage Reading, and Low Standby Power Consumption Open Access

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER-Electronic Circuits

      Pubricized:
    2023/03/16
      Vol:
    E106-C No:9
      Page(s):
    466-476

    We developed a self-controllable voltage level (SVL) circuit and applied this circuit to a single-power-supply, six-transistor complementary metal-oxide-semiconductor static random-access memory (SRAM) to not only improve both write and read performances but also to achieve low standby power and data retention (holding) capability. The SVL circuit comprises only three MOSFETs (i.e., pull-up, pull-down and bypass MOSFETs). The SVL circuit is able to adaptively generate both optimal memory cell voltages and word line voltages depending on which mode of operation (i.e., write, read or hold operation) was used. The write margin (VWM) and read margin (VRM) of the developed (dvlp) SRAM at a supply voltage (VDD) of 1V were 0.470 and 0.1923V, respectively. These values were 1.309 and 2.093 times VWM and VRM of the conventional (conv) SRAM, respectively. At a large threshold voltage (Vt) variability (=+6σ), the minimum power supply voltage (VMin) for the write operation of the conv SRAM was 0.37V, whereas it decreased to 0.22V for the dvlp SRAM. VMin for the read operation of the conv SRAM was 1.05V when the Vt variability (=-6σ) was large, but the dvlp SRAM lowered it to 0.41V. These results show that the SVL circuit expands the operating voltage range for both write and read operations to lower voltages. The dvlp SRAM reduces the standby power consumption (PST) while retaining data. The measured PST of the 2k-bit, 90-nm dvlp SRAM was only 0.957µW at VDD=1.0V, which was 9.46% of PST of the conv SRAM (10.12µW). The Si area overhead of the SVL circuits was only 1.383% of the dvlp SRAM.

  • A Study of Lower Sideband Enhancement on Analog Recording VCR

    Masaaki KOBAYASHI  Yasutoshi YAMAMOTO  Shinichi AKI  Yoshitomi NAGAOKA  

     
    LETTER-Magnetic Recording

      Vol:
    E72-E No:4
      Page(s):
    326-327

    To study lower sideband enhancement, three kinds of modulation methods are applied to recording. Compared with results of experiments using a VCR and simulations, it is concluded that the order of magnitude of lower sideband enhancement is AM, Dual-frequency, FM with lower modulation index.

  • The Object-Space Parallel Processing of the Multipass Rendering Method on the (Mπ)2 with a Distributed-Frame Buffer System

    Hitoshi YAMAUCHI  Takayuki MAEDA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Computer Architecture

      Vol:
    E80-D No:9
      Page(s):
    909-918

    The multipass rendering method based on the global illumination model can generate the most photo-realistic images. However, since the multipass rendering method is very time consuming, it is impractical in the industrial world. This paper discusses a massively parallel processing approach to fast image synthesis by the multipass rendering method. Especially, we focus on the performance evaluation of the view-dependent object-space parallel processing on the (Mπ)2 which has been proposed in our previous paper. We also propose two kinds of distributed frame buffer system named cached frame buffer and multistage-interconnected frame buffer. These frame buffer systems can solve the access conflict problem on the frame buffer. The simulation results show that the (Mπ)2 has a scalable performance. For example, the (Mπ)2 with more than 4000 processing elements can achieve an efficiency of over 50%. We also show that both of the proposed distributed frame buffer systems can relieve the overhead due to frame buffer access in the (Mπ)2 in the case that a large number of high-performance processing elements are adopted in the system.

1-20hit(60hit)