The search functionality is under construction.

Author Search Result

[Author] Yutaka MASUDA(10hit)

1-10hit
  • Activation-Aware Slack Assignment Based Mode-Wise Voltage Scaling for Energy Minimization

    TaiYu CHENG  Yutaka MASUDA  Jun NAGAYAMA  Yoichi MOMIYAMA  Jun CHEN  Masanori HASHIMOTO  

     
    PAPER

      Pubricized:
    2021/08/31
      Vol:
    E105-A No:3
      Page(s):
    497-508

    Reducing power consumption is a crucial factor making industrial designs, such as mobile SoCs, competitive. Voltage scaling (VS) is the classical yet most effective technique that contributes to quadratic power reduction. A recent design technique called activation-aware slack assignment (ASA) enhances the voltage-scaling by allocating the timing margin of critical paths with a stochastic mean-time-to-failure (MTTF) analysis. Meanwhile, such stochastic treatment of timing errors is accepted in limited application domains, such as image processing. This paper proposes a design optimization methodology that achieves a mode-wise voltage-scalable (MWVS) design guaranteeing no timing errors in each mode operation. This work formulates the MWVS design as an optimization problem that minimizes the overall power consumption considering each mode duration, achievable voltage lowering and accompanied circuit overhead explicitly, and explores the solution space with the downhill simplex algorithm that does not require numerical derivation and frequent objective function evaluations. For obtaining a solution, i.e., a design, in the optimization process, we exploit the multi-corner multi-mode design flow in a commercial tool for performing mode-wise ASA with sets of false paths dedicated to individual modes. We applied the proposed design methodology to RISC-V design. Experimental results show that the proposed methodology saves 13% to 20% more power compared to the conventional VS approach and attains 8% to 15% gain from the conventional single-mode ASA. We also found that cycle-by-cycle fine-grained false path identification reduced leakage power by 31% to 42%.

  • Approximate Minimum Energy Point Tracking and Task Scheduling for Energy-Efficient Real-Time Computing

    Takumi KOMORI  Yutaka MASUDA  Jun SHIOMI  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2021/09/06
      Vol:
    E105-A No:3
      Page(s):
    518-529

    In the upcoming Internet of Things era, reducing energy consumption of embedded processors is highly desired. Minimum Energy Point Tracking (MEPT) is one of the most efficient methods to reduce both dynamic and static energy consumption of a processor. Previous works proposed a variety of MEPT methods over the past years. However, none of them incorporate their algorithms with practical real-time operating systems, although edge computing applications often require low energy task execution with guaranteeing real-time properties. The difficulty comes from the time complexity for identifying an MEP and changing voltages, which often prevents real-time task scheduling. The conventional Dynamic Voltage and Frequency Scaling (DVFS) only scales the supply voltage. On the other hand, MEPT needs to adjust the body bias voltage in addition. This additional tuning knob makes MEPT much more complicated. This paper proposes an approximate MEPT algorithm, which reduces the complexity of identifying an MEP down to that of DVFS. The key idea is to linearly approximate the relationship between the processor frequency, supply voltage, and body bias voltage. Thanks to the approximation, optimal voltages for a specified clock frequency can be derived immediately. We also propose a task scheduling algorithm, which adjusts processor performance to the workload and then provides a soft real-time capability to the system. The operating system stochastically adjusts the average response time of the processor to be equal to a specified deadline. MEPT will be performed as a general task, and its overhead is considered in the calculation of the frequency. The experiments using a fabricated test chip and on-chip sensors show that the proposed algorithm is a maximum of 16 times more energy-efficient than DVFS. Also, the energy loss induced by the approximation is only 3% at most, and the algorithm does not sacrifice the fundamental real-time properties.

  • MTTF-Aware Design Methodology of Adaptively Voltage Scaled Circuit with Timing Error Predictive Flip-Flop

    Yutaka MASUDA  Masanori HASHIMOTO  

     
    PAPER

      Vol:
    E102-A No:7
      Page(s):
    867-877

    Adaptive voltage scaling is a promising approach to overcome manufacturing variability, dynamic environmental fluctuation, and aging. This paper focuses on error prediction based adaptive voltage scaling (EP-AVS) and proposes a mean time to failure (MTTF) aware design methodology for EP-AVS circuits. Main contributions of this work include (1) optimization of both voltage-scaled circuit and voltage control logic, and (2) quantitative evaluation of power saving for practically long MTTF. Experimental results show that the proposed EP-AVS design methodology achieves 38.0% power saving while satisfying given target MTTF.

  • Low-Power Design Methodology of Voltage Over-Scalable Circuit with Critical Path Isolation and Bit-Width Scaling Open Access

    Yutaka MASUDA  Jun NAGAYAMA  TaiYu CHENG  Tohru ISHIHARA  Yoichi MOMIYAMA  Masanori HASHIMOTO  

     
    PAPER

      Pubricized:
    2021/08/31
      Vol:
    E105-A No:3
      Page(s):
    509-517

    This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.

  • Neural Network Calculations at the Speed of Light Using Optical Vector-Matrix Multiplication and Optoelectronic Activation

    Naoki HATTORI  Jun SHIOMI  Yutaka MASUDA  Tohru ISHIHARA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/17
      Vol:
    E104-A No:11
      Page(s):
    1477-1487

    With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.

  • Performance Evaluation of Software-Based Error Detection Mechanisms for Supply Noise Induced Timing Errors

    Yutaka MASUDA  Takao ONOYE  Masanori HASHIMOTO  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1452-1463

    Software-based error detection techniques, which includes error detection mechanism (EDM) transformation, are used for error localization in post-silicon validation. This paper evaluates the performance of EDM for timing error localization with a noise-aware logic simulator and 65-nm test chips assuming the following two EDM usage scenarios; (1) localizing a timing error occurred in the original program, and (2) localizing as many potential timing errors as possible. Simulation results show that the EDM transformation customized for quick error detection cannot locate electrical timing errors in the original program in the first scenario, but it detects 86% of non-masked errors potential bugs in the second scenario, which mean the EDM performance of detecting electrical timing errors affecting execution results is high. Hardware measurement results show that the EDM detects 25% of original timing errors and 56% of non-masked errors. Here, these hardware measurement results are not consistent with the simulation results. To investigate the reason, we focus on the following two differences between hardware and simulation; (1) design of power distribution network, and (2) definition of timing error occurrence frequency. We update the simulation setup for filling the difference and re-execute the simulation. We confirm that the simulation and the chip measurement results are consistent.

  • Dynamic Verification Framework of Approximate Computing Circuits using Quality-Aware Coverage-Based Grey-Box Fuzzing

    Yutaka MASUDA  Yusei HONDA  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2022/09/02
      Vol:
    E106-A No:3
      Page(s):
    514-522

    Approximate computing (AC) has recently emerged as a promising approach to the energy-efficient design of digital systems. For realizing the practical AC design, we need to verify whether the designed circuit can operate correctly under various operating conditions. Namely, the verification needs to efficiently find fatal logic errors or timing errors that violate the constraint of computational quality. This work focuses on the verification where the computational results can be observed, the computational quality can be calculated from computational results, and the constraint of computational quality is given and defined as the constraint which is set to the computational quality of designed AC circuit with given workloads. Then, this paper proposes a novel dynamic verification framework of the AC circuit. The key idea of the proposed framework is to incorporate a quality assessment capability into the Coverage-based Grey-box Fuzzing (CGF). CGF is one of the most promising techniques in the research field of software security testing. By repeating (1) mutation of test patterns, (2) execution of the program under test (PUT), and (3) aggregation of coverage information and feedback to the next test pattern generation, CGF can explore the verification space quickly and automatically. On the other hand, CGF originally cannot consider the computational quality by itself. For overcoming this quality unawareness in CGF, the proposed framework additionally embeds the Design Under Verification (DUV) component into the calculation part of computational quality. Thanks to the DUV integration, the proposed framework realizes the quality-aware feedback loop in CGF and thus quickly enhances the verification coverage for test patterns that violate the quality constraint. In this work, we quantitatively compared the verification coverage of the approximate arithmetic circuits between the proposed framework and the random test. In a case study of an approximate multiply-accumulate (MAC) unit, we experimentally confirmed that the proposed framework achieved 3.85 to 10.36 times higher coverage than the random test.

  • An Accuracy Reconfigurable Vector Accelerator based on Approximate Logarithmic Multipliers for Energy-Efficient Computing

    Lingxiao HOU  Yutaka MASUDA  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2022/09/02
      Vol:
    E106-A No:3
      Page(s):
    532-541

    The approximate logarithmic multiplier proposed by Mitchell provides an efficient alternative for processing dense multiplication or multiply-accumulate operations in applications such as image processing and real-time robotics. It offers the advantages of small area, high energy efficiency and is suitable for applications that do not necessarily achieve high accuracy. However, its maximum error of 11.1% makes it challenging to deploy in applications requiring relatively high accuracy. This paper proposes a novel operand decomposition method (OD) that decomposes one multiplication into the sum of multiple approximate logarithmic multiplications to widely reduce Mitchell multiplier errors while taking full advantage of its area savings. Based on the proposed OD method, this paper also proposes an accuracy reconfigurable multiply-accumulate (MAC) unit that provides multiple reconfigurable accuracies with high parallelism. Compared to a MAC unit consisting of accurate multipliers, the area is significantly reduced to less than half, improving the hardware parallelism while satisfying the required accuracy for various scenarios. The experimental results show the excellent applicability of our proposed MAC unit in image smoothing and robot localization and mapping application. We have also designed a prototype processor that integrates the minimum functionality of this MAC unit as a vector accelerator and have implemented a software-level accuracy reconfiguration in the form of an instruction set extension. We experimentally confirmed the correct operation of the proposed vector accelerator, which provides the different degrees of accuracy and parallelism at the software level.

  • Virtualizing DVFS for Energy Minimization of Embedded Dual-OS Platform

    Takumi KOMORI  Yutaka MASUDA  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2023/07/12
      Vol:
    E107-A No:1
      Page(s):
    3-15

    Recent embedded systems require both traditional machinery control and information processing, such as network and GUI handling. A dual-OS platform consolidates a real-time OS (RTOS) and general-purpose OS (GPOS) to realize efficient software development on one physical processor. Although the dual-OS platform attracts increasing attention, it often suffers from energy inefficiency in the GPOS for guaranteeing real-time responses of the RTOS. This paper proposes an energy minimization method called DVFS virtualization, which allows running multiple DVFS policies dedicated to the RTOS and GPOS, respectively. The experimental evaluation using a commercial microcontroller showed that the proposed hardware could change the supply voltage within 500 ns and reduce the energy consumption of typical applications by 60 % in the best case compared to conventional dual-OS platforms. Furthermore, evaluation using a commercial microprocessor achieved a 15 % energy reduction of practical open-source software at best.

  • Identification of Redundant Flip-Flops Using Fault Injection for Low-Power Approximate Computing Circuits

    Jiaxuan LU  Yutaka MASUDA  Tohru ISHIHARA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2023/08/31
      Vol:
    E107-A No:3
      Page(s):
    540-548

    Approximate computing (AC) saves energy and improves performance by introducing approximation into computation in error-torrent applications. This work focuses on an AC strategy that accurately performs important computations and approximates others. In order to make AC circuits practical, we need to determine which computation is how important carefully, and thus need to appropriately approximate the redundant computation for maintaining the required computational quality. In this paper, we focus on the importance of computations at the flip-flop (FF) level and propose a novel importance evaluation methodology. The key idea of the proposed methodology is a two-step fault injection algorithm to extract the near-optimal set of redundant FFs in the circuit. In the first step, the proposed methodology performs the FI simulation for each FF and extracts the candidates of redundant FFs. Then, in the second step, the proposed methodology extracts the set of redundant FFs in a binary search manner. Thanks to the two-step strategy, the proposed algorithm reduces the complexity of architecture exploration from an exponential order to a linear order without understanding the functionality and behavior of the target application program. Experimental results show that the proposed methodology identifies the candidates of redundant FFs depending on the given constraints. In a case study of an image processing accelerator, the truncation for identified redundant FFs reduces the circuit area by 29.6% and saves power dissipation by 44.8% under the ASIC implementation while satisfying the PSNR constraint. Similarly, the dynamic power dissipation is saved by 47.2% under the FPGA implementation.