The search functionality is under construction.

Author Search Result

[Author] Tohru ISHIHARA(30hit)

1-20hit(30hit)

  • Programmable Power Management Architecture for Power Reduction

    Tohru ISHIHARA  Hiroto YASUURA  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1473-1480

    This paper presents Power-Pro architecture (Programmable Power Management Architecture), a novel processor architecture for power reduction. The Power-Pro architecture has two key functionalities: (i) Supply voltage and clock frequency of a microprocessor can be dynamically varied, and (ii) active datapath width can be dynamically adjusted to the precision of each operation. The most unique point of this architecture is that software programmers can directly specify the requirements of applications such as real-time constraints and precision of the operations. To make programmable power management possible, Power-Pro architecture equips special instructions. Programmers can vary the supply voltage, the clock frequency and the active datapath width dynamically by the instructions. Experimental results show that power consumption for a variety of applications are dramatically reduced by the Power-Pro architecture.

  • A Memory Power Optimization Technique for Application Specific Embedded Systems

    Tohru ISHIHARA  Hiroto YASUURA  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2366-2374

    In this paper, a novel application specific power optimization technique utilizing small instruction ROM which is placed between an instruction cache or a main program memory and CPU core is proposed. Our optimization technique targets embedded systems which assume the following: (i) instruction memories are organized by two on-chip memories, a main program memory and a subprogram memory, (ii) these two memories can be independently powered-up or powered-down by a special instruction of a core processor, and (iii) a compiler optimizes an allocation of object code into these two memories so as to minimize average of read energy consumption. In many application programs, only a few basic blocks are frequently executed. Therefore, allocating these frequently executed basic blocks into low power subprogram memory leads significant energy reduction. Our experiments with actual ROM (Read Only Memory) modules created with 0.5 µm CMOS process technology, and MPEG2 codec program demonstrate significant energy reductions up to more than 50% at best case over the previous approach that applies only divided bit and word lines structure.

  • A Multi-Performance Processor for Reducing the Energy Consumption of Real-Time Embedded Systems

    Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E93-A No:12
      Page(s):
    2533-2541

    This paper proposes an energy efficient processor which can be used as a design alternative for the dynamic voltage scaling (DVS) processors in embedded system design. The processor consists of multiple PE (processing element) cores and a selective set-associative cache memory. The PE-cores have the same instruction set architecture but differ in their clock speeds and energy consumptions. Only a single PE-core is activated at a time and the other PE-cores are deactivated using clock gating and signal gating techniques. The major advantage over the DVS processors is a small overhead for changing its performance. The gate-level simulation demonstrates that our processor can change its performance within 1.5 microsecond and dissipates about 10 nano-joule while conventional DVS processors need hundreds of microseconds and dissipate a few micro-joule for the performance transition. This makes it possible to apply our multi-performance processor to many real-time systems and to perform finer grained and more sophisticated dynamic voltage control.

  • Methods for Reducing Power and Area of BDD-Based Optical Logic Circuits

    Ryosuke MATSUO  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1751-1759

    Optical circuits using nanophotonic devices attract significant interest due to its ultra-high speed operation. As a consequence, the synthesis methods for the optical circuits also attract increasing attention. However, existing methods for synthesizing optical circuits mostly rely on straight-forward mappings from established data structures such as Binary Decision Diagram (BDD). The strategy of simply mapping a BDD to an optical circuit sometimes results in an explosion of size and involves significant power losses in branches and optical devices. To address these issues, this paper proposes a method for reducing the size of BDD-based optical logic circuits exploiting wavelength division multiplexing (WDM). The paper also proposes a method for reducing the number of branches in a BDD-based circuit, which reduces the power dissipation in laser sources. Experimental results obtained using a partial product accumulation circuit used in a 4-bit parallel multiplier demonstrates significant advantages of our method over existing approaches in terms of area and power consumption.

  • On-Chip Cache Architecture Exploiting Hybrid Memory Structures for Near-Threshold Computing

    Hongjie XU  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1741-1750

    This paper focuses on power-area trade-off axis to memory systems. Compared with the power-performance-area trade-off application on the traditional high performance cache, this paper focuses on the edge processing environment which is becoming more and more important in the Internet of Things (IoT) era. A new power-oriented trade-off is proposed for on-chip cache architecture. As a case study, this paper exploits a good energy efficiency of Standard-Cell Memory (SCM) operating in a near-threshold voltage region and a good area efficiency of Static Random Access Memory (SRAM). A hybrid 2-level on-chip cache structure is first introduced as a replacement of 6T-SRAM cache as L0 cache to save the energy consumption. This paper proposes a method for finding the best capacity combination for SCM and SRAM, which minimizes the energy consumption of the hybrid cache under a specific cache area constraint. The simulation result using a 65-nm process technology shows that up to 80% energy consumption is reduced without increasing the die area by replacing the conventional SRAM instruction cache with the hybrid 2-level cache. The result shows that energy consumption can be reduced if the area constraint for the proposed hybrid cache system is less than the area which is equivalent to a 8kB SRAM. If the target operating frequency is less than 100MHz, energy reduction can be achieved, which implies that the proposed cache system is suitable for low-power systems where a moderate processing speed is required.

  • Statistical Timing Modeling Based on a Lognormal Distribution Model for Near-Threshold Circuit Optimization

    Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1455-1466

    Near-threshold computing has emerged as one of the most promising solutions for enabling highly energy efficient and high performance computation of microprocessors. This paper proposes architecture-level statistical static timing analysis (SSTA) models for the near-threshold voltage computing where the path delay distribution is approximated as a lognormal distribution. First, we prove several important theorems that help consider architectural design strategies for high performance and energy efficient near-threshold computing. After that, we show the numerical experiments with Monte Carlo simulations using a commercial 28nm process technology model and demonstrate that the properties presented in the theorems hold for the practical near-threshold logic circuits.

  • Standard Cell Structure with Flexible P/N Well Boundaries for Near-Threshold Voltage Operation

    Shinichi NISHIZAWA  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER-Physical Level Design

      Vol:
    E96-A No:12
      Page(s):
    2499-2507

    This paper propose a structure of standard cells where the P/N boundary ratio of each cell can be independently customized for near-threshold operation. Lowering the supply voltage is one of the most promising approaches for reducing the power consumption of VLSI circuit, however, this causes an increase of imbalance between rise and fall delays for cells having transistor stacks. Conventional cell library with fixed P/N boundary is not efficient to compensate this delay imbalance. Proposed structure achieves individual P/N boundary ratio optimization for each standard cell, therefore it cancels the imbalance between rise and fall delays at the expense of cell area. Proposed structure is verified using measured result of Ring Oscillator circuits and simulation result of benchmark circuits in 65nm CMOS. The experiments with ISCAS'85 benchmark circuits demonstrate that the standard cell library consisting of the proposed cells reduces the power consumption of the benchmark circuits by 16% on average without increasing the circuit area, compared to that of the same circuit synthesized with a library which is not optimized for the near-threshold operation.

  • A Minimum Energy Point Tracking Algorithm Based on Dynamic Voltage Scaling and Adaptive Body Biasing

    Shu HOKIMOTO  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2776-2784

    Scaling the supply voltage (Vdd) and threshold voltage (Vth) for minimizing the energy consumption of processors dynamically is highly desired for applications such as wireless sensor network and Internet of Things (IoT). In this paper, we refer to the pair of Vdd and Vth, which minimizes the energy consumption of the processor under a given operating condition, as a minimum energy point (MEP in short). Since the MEP is heavily dependent on an operating condition determined by a chip temperature, an activity factor, a process variation, and a performance required for the processor, it is not very easy to closely track the MEP at runtime. This paper proposes a simple but effective algorithm for dynamically tracking the MEP of a processor under a wide range of operating conditions. Gate-level simulation of a 32-bit RISC processor in a 65nm process demonstrates that the proposed algorithm tracks the MEP under a situation that operating condition widely vary.

  • A System Level Optimization Technique for Application Specific Low Power Memories

    Tohru ISHIHARA  Kunihiro ASADA  

     
    PAPER-Optimization of Power and Timing

      Vol:
    E84-A No:11
      Page(s):
    2755-2761

    A system level approach for a memory power reduction is proposed in this paper. The basic idea is allocating frequently executed object codes into a small subprogram memory and optimizing supply voltage and threshold voltage of the subprogram memory. Since large scale memory contains a lot of direct paths from power supply to ground, power dissipation caused by subthreshold leakage current is more serious than dynamic power dissipation. Our approach optimizes the size of subprogram memory, supply voltage, and threshold voltage so as to minimize memory power dissipation including static power dissipation caused by leakage current. A heuristic algorithm which determines code allocation, supply voltage, and threshold voltage simultaneously so as to minimize power dissipation of memories is proposed as well. Our experiments with some benchmark programs demonstrate significant energy reductions up to 80% over a program memory which does not employ our approach.

  • DC-DC Converter-Aware Task Scheduling and Dynamic Reconfiguration for Energy Harvesting Embedded Systems

    Kyungsoo LEE  Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2660-2667

    Energy-harvesting devices are materials that allow ambient energy sources to be converters into usable electrical power. While a battery powers the modern embedded systems, these energy-harvesting devices power the energy-harvesting embedded systems. This claims a new energy efficient management techniques for the energy-harvesting systems dislike the previous management techniques. The higher entire system efficiency in an energy-harvesting system can be obtained by a higher generating efficiency, a higher consuming efficiency, or a higher transferring efficiency. This paper presents a generalized technique for a dynamic reconfiguration and a task scheduling considering the power loss in DC-DC converters in the system. The proposed technique minimizes the power loss in the DC-DC converter and charger of the system. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the total energy consumption by 22% in average over the conventional approach.

  • A High-Performance and Low-Power Cache Architecture with Speculative Way-Selection

    Koji INOUE  Tohru ISHIHARA  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E83-C No:2
      Page(s):
    186-194

    This paper proposes a new approach to achieving high performance and low energy consumption for set-associative caches. The cache, called way-predicting set-associative cache, speculatively selects a single way, which is likely to contain the data desired by the processor, from the set designated by a memory address, before it starts a normal cache access. By accessing only the single way predicted, instead of accessing all the ways in a set, energy consumption can be reduced. In order for the way-predicting cache to perform well, accuracy of way prediction is important. This paper shows that the accuracy of an MRU (most recently used)-based way prediction is higher than 90% for most of the benchmark programs. The proposed way-predicting cache improves the ED (energy-delay) product by 60-70% compared to the conventional set-associative cache.

  • Architectural-Level Soft-Error Modeling for Estimating Reliability of Computer Systems

    Makoto SUGIHARA  Tohru ISHIHARA  Kazuaki MURAKAMI  

     
    PAPER-VLSI Design Technology

      Vol:
    E90-C No:10
      Page(s):
    1983-1991

    This paper proposes a soft-error model for accurately estimating reliability of a computer system at the architectural level within reasonable computation time. The architectural-level soft-error model identifies which part of memory modules are utilized temporally and spatially and which single event upsets (SEUs) are critical to the program execution of the computer system at the cycle accurate instruction set simulation (ISS) level. The soft-error model is capable of estimating reliability of a computer system that has several memory hierarchies with it and finding which memory module is vulnerable in the computer system. Reliability estimation helps system designers apply reliable design techniques to vulnerable part of their design. The experimental results have shown that the usage of the soft-error model achieved more accurate reliability estimation than conventional approaches. The experimental results demonstrate that reliability of computer systems depends on not only soft error rates (SERs) of memories but also the behavior of software running in computer systems.

  • System LSI Design Methods for Low Power LSIs

    Hiroto YASUURA  Tohru ISHIHARA  

     
    INVITED PAPER

      Vol:
    E83-C No:2
      Page(s):
    143-152

    Low Power design has emerged as a both practically and theoretically attractive theme in modern LSI system design. This paper presents system level power optimization techniques. A brief survey of system level low power design approaches and several examples in detail are described. It reviews some techniques that have been proposed to overcome the power issue and gives guideline for prospective system level solutions.

  • Way-Scaling to Reduce Power of Cache with Delay Variation

    Maziar GOUDARZI  Tadayuki MATSUMURA  Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E91-A No:12
      Page(s):
    3576-3584

    The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all) of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90 nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16 KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.

  • Approximate Minimum Energy Point Tracking and Task Scheduling for Energy-Efficient Real-Time Computing

    Takumi KOMORI  Yutaka MASUDA  Jun SHIOMI  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2021/09/06
      Vol:
    E105-A No:3
      Page(s):
    518-529

    In the upcoming Internet of Things era, reducing energy consumption of embedded processors is highly desired. Minimum Energy Point Tracking (MEPT) is one of the most efficient methods to reduce both dynamic and static energy consumption of a processor. Previous works proposed a variety of MEPT methods over the past years. However, none of them incorporate their algorithms with practical real-time operating systems, although edge computing applications often require low energy task execution with guaranteeing real-time properties. The difficulty comes from the time complexity for identifying an MEP and changing voltages, which often prevents real-time task scheduling. The conventional Dynamic Voltage and Frequency Scaling (DVFS) only scales the supply voltage. On the other hand, MEPT needs to adjust the body bias voltage in addition. This additional tuning knob makes MEPT much more complicated. This paper proposes an approximate MEPT algorithm, which reduces the complexity of identifying an MEP down to that of DVFS. The key idea is to linearly approximate the relationship between the processor frequency, supply voltage, and body bias voltage. Thanks to the approximation, optimal voltages for a specified clock frequency can be derived immediately. We also propose a task scheduling algorithm, which adjusts processor performance to the workload and then provides a soft real-time capability to the system. The operating system stochastically adjusts the average response time of the processor to be equal to a specified deadline. MEPT will be performed as a general task, and its overhead is considered in the calculation of the frequency. The experiments using a fabricated test chip and on-chip sensors show that the proposed algorithm is a maximum of 16 times more energy-efficient than DVFS. Also, the energy loss induced by the approximation is only 3% at most, and the algorithm does not sacrifice the fundamental real-time properties.

  • Low-Power Design Methodology of Voltage Over-Scalable Circuit with Critical Path Isolation and Bit-Width Scaling Open Access

    Yutaka MASUDA  Jun NAGAYAMA  TaiYu CHENG  Tohru ISHIHARA  Yoichi MOMIYAMA  Masanori HASHIMOTO  

     
    PAPER

      Pubricized:
    2021/08/31
      Vol:
    E105-A No:3
      Page(s):
    509-517

    This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.

  • An Integrated Framework for Energy Optimization of Embedded Real-Time Applications

    Hideki TAKASE  Gang ZENG  Lovic GAUTHIER  Hirotaka KAWASHIMA  Noritoshi ATSUMI  Tomohiro TATEMATSU  Yoshitake KOBAYASHI  Takenori KOSHIRO  Tohru ISHIHARA  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2477-2487

    This paper presents a framework for reducing the energy consumption of embedded real-time systems. We implemented the presented framework as both an optimization toolchain and an energy-aware real-time operating system. The framework consists of the integration of multiple techniques to optimize the energy consumption. The main idea behind our approach is to utilize trade-offs between the energy consumption and the performance of different processor configurations during task checkpoints, and to maintain memory allocation during task context switches. In our framework, a target application is statically analyzed at both intra-task and inter-task levels. Based on these analyzed results, runtime optimization is performed in response to the behavior of the application. A case study shows that our toolchain and real-time operating systems have achieved energy reduction while satisfying the real-time performance. The toolchain has also been successfully applied to a practical application.

  • Implementation of Stack Data Placement and Run Time Management Using a Scratch-Pad Memory for Energy Consumption Reduction of Embedded Applications

    Lovic GAUTHIER  Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2597-2608

    Memory accesses are a major cause of energy consumption for embedded systems. This paper presents the implementation of a fully software technique which places stack and static data into a scratch-pad memory (SPM) in order to reduce the energy consumed by the processor while accessing them. Since an SPM is usually too small to include all these data, some of them must be left into the external main memory (MM). Therefore, further energy reduction is achieved by moving some stack data between both memories at run time. The technique employs integer linear programming in order to find at compile time the optimal placement of static data and management of the stack and implements it by inserting stack operations inside the code. Experimental results show that with an SPM of only 1 KB, our technique is able to exploit it for reducing the energy consumption related to the static and stack data accesses by more than 90% for several applications and on an average by 57% compared to the case where these data are fully placed into the main memory.

  • Neural Network Calculations at the Speed of Light Using Optical Vector-Matrix Multiplication and Optoelectronic Activation

    Naoki HATTORI  Jun SHIOMI  Yutaka MASUDA  Tohru ISHIHARA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/17
      Vol:
    E104-A No:11
      Page(s):
    1477-1487

    With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.

  • A Synthesis Method Based on Multi-Stage Optimization for Power-Efficient Integrated Optical Logic Circuits

    Ryosuke MATSUO  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/18
      Vol:
    E104-A No:11
      Page(s):
    1546-1554

    Optical logic circuits based on integrated nanophotonics attract significant interest due to their ultra-high-speed operation. However, the power dissipation of conventional optical logic circuits is exponential to the number of inputs of target logic functions. This paper proposes a synthesis method reducing power dissipation to a polynomial order of the number of inputs while exploiting the high-speed nature. Our method divides the target logic function into multiple sub-functions with Optical-to-Electrical (OE) converters. Each sub-function has a smaller number of inputs than that of the original function, which enables to exponentially reduce the power dissipated by an optical logic circuit representing the sub-function. The proposed synthesis method can mitigate the OE converter delay overhead by parallelizing sub-functions. We apply the proposed synthesis method to the ISCAS'85 benchmark circuits. The power consumption of the conventional circuits based on the Binary Decision Diagram (BDD) is at least three orders of magnitude larger than that of the optical logic circuits synthesized by the proposed method. The proposed method reduces the power consumption to about 100mW. The delay of almost all the circuits synthesized by the proposed method is kept less than four times the delay of the conventional BDD-based circuit.

1-20hit(30hit)