The search functionality is under construction.

Author Search Result

[Author] Jun SHIOMI(15hit)

1-15hit
  • A Hardware Efficient Reservoir Computing System Using Cellular Automata and Ensemble Bloom Filter

    Dehua LIANG  Jun SHIOMI  Noriyuki MIURA  Masanori HASHIMOTO  Hiromitsu AWANO  

     
    PAPER-Computer System

      Pubricized:
    2022/04/08
      Vol:
    E105-D No:7
      Page(s):
    1273-1282

    Reservoir computing (RC) is an attractive alternative to machine learning models owing to its computationally inexpensive training process and simplicity. In this work, we propose EnsembleBloomCA, which utilizes cellular automata (CA) and an ensemble Bloom filter to organize an RC system. In contrast to most existing RC systems, EnsembleBloomCA eliminates all floating-point calculation and integer multiplication. EnsembleBloomCA adopts CA as the reservoir in the RC system because it can be implemented using only binary operations and is thus energy efficient. The rich pattern dynamics created by CA can map the original input into a high-dimensional space and provide more features for the classifier. Utilizing an ensemble Bloom filter as the classifier, the features provided by the reservoir can be effectively memorized. Our experiment revealed that applying the ensemble mechanism to the Bloom filter resulted in a significant reduction in memory cost during the inference phase. In comparison with Bloom WiSARD, one of the state-of-the-art reference work, the EnsembleBloomCA model achieves a 43× reduction in memory cost while maintaining the same accuracy. Our hardware implementation also demonstrated that EnsembleBloomCA achieved over 23× and 8.5× reductions in area and power, respectively.

  • Methods for Reducing Power and Area of BDD-Based Optical Logic Circuits

    Ryosuke MATSUO  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1751-1759

    Optical circuits using nanophotonic devices attract significant interest due to its ultra-high speed operation. As a consequence, the synthesis methods for the optical circuits also attract increasing attention. However, existing methods for synthesizing optical circuits mostly rely on straight-forward mappings from established data structures such as Binary Decision Diagram (BDD). The strategy of simply mapping a BDD to an optical circuit sometimes results in an explosion of size and involves significant power losses in branches and optical devices. To address these issues, this paper proposes a method for reducing the size of BDD-based optical logic circuits exploiting wavelength division multiplexing (WDM). The paper also proposes a method for reducing the number of branches in a BDD-based circuit, which reduces the power dissipation in laser sources. Experimental results obtained using a partial product accumulation circuit used in a 4-bit parallel multiplier demonstrates significant advantages of our method over existing approaches in terms of area and power consumption.

  • On-Chip Cache Architecture Exploiting Hybrid Memory Structures for Near-Threshold Computing

    Hongjie XU  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1741-1750

    This paper focuses on power-area trade-off axis to memory systems. Compared with the power-performance-area trade-off application on the traditional high performance cache, this paper focuses on the edge processing environment which is becoming more and more important in the Internet of Things (IoT) era. A new power-oriented trade-off is proposed for on-chip cache architecture. As a case study, this paper exploits a good energy efficiency of Standard-Cell Memory (SCM) operating in a near-threshold voltage region and a good area efficiency of Static Random Access Memory (SRAM). A hybrid 2-level on-chip cache structure is first introduced as a replacement of 6T-SRAM cache as L0 cache to save the energy consumption. This paper proposes a method for finding the best capacity combination for SCM and SRAM, which minimizes the energy consumption of the hybrid cache under a specific cache area constraint. The simulation result using a 65-nm process technology shows that up to 80% energy consumption is reduced without increasing the die area by replacing the conventional SRAM instruction cache with the hybrid 2-level cache. The result shows that energy consumption can be reduced if the area constraint for the proposed hybrid cache system is less than the area which is equivalent to a 8kB SRAM. If the target operating frequency is less than 100MHz, energy reduction can be achieved, which implies that the proposed cache system is suitable for low-power systems where a moderate processing speed is required.

  • Statistical Timing Modeling Based on a Lognormal Distribution Model for Near-Threshold Circuit Optimization

    Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1455-1466

    Near-threshold computing has emerged as one of the most promising solutions for enabling highly energy efficient and high performance computation of microprocessors. This paper proposes architecture-level statistical static timing analysis (SSTA) models for the near-threshold voltage computing where the path delay distribution is approximated as a lognormal distribution. First, we prove several important theorems that help consider architectural design strategies for high performance and energy efficient near-threshold computing. After that, we show the numerical experiments with Monte Carlo simulations using a commercial 28nm process technology model and demonstrate that the properties presented in the theorems hold for the practical near-threshold logic circuits.

  • Evaluation Metrics for the Cost of Data Movement in Deep Neural Network Acceleration

    Hongjie XU  Jun SHIOMI  Hidetoshi ONODERA  

     
    PAPER

      Pubricized:
    2021/06/01
      Vol:
    E104-A No:11
      Page(s):
    1488-1498

    Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.

  • A DLL-Based Body Bias Generator with Independent P-Well and N-Well Biasing for Minimum Energy Operation

    Kentaro NAGAI  Jun SHIOMI  Hidetoshi ONODERA  

     
    PAPER

      Pubricized:
    2021/04/20
      Vol:
    E104-C No:10
      Page(s):
    617-624

    This paper proposes an area- and energy-efficient DLL-based body bias generator (BBG) for minimum energy operation that controls p-well and n-well bias independently. The BBG can minimize total energy consumption of target circuits under a skewed process condition between nMOSFETs and pMOSFETs. The proposed BBG is composed of digital cells compatible with cell-based design, which enables energy- and area-efficient implementation without additional supply voltages. A test circuit is implemented in a 65-nm FDSOI process. Measurement results using a 32-bit RISC processor on the same chip show that the proposed BBG can reduce energy consumption close to a minimum within a 3% energy loss. In this condition, energy and area overheads of the BBG are 0.2% and 0.12%, respectively.

  • Approximate Minimum Energy Point Tracking and Task Scheduling for Energy-Efficient Real-Time Computing

    Takumi KOMORI  Yutaka MASUDA  Jun SHIOMI  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2021/09/06
      Vol:
    E105-A No:3
      Page(s):
    518-529

    In the upcoming Internet of Things era, reducing energy consumption of embedded processors is highly desired. Minimum Energy Point Tracking (MEPT) is one of the most efficient methods to reduce both dynamic and static energy consumption of a processor. Previous works proposed a variety of MEPT methods over the past years. However, none of them incorporate their algorithms with practical real-time operating systems, although edge computing applications often require low energy task execution with guaranteeing real-time properties. The difficulty comes from the time complexity for identifying an MEP and changing voltages, which often prevents real-time task scheduling. The conventional Dynamic Voltage and Frequency Scaling (DVFS) only scales the supply voltage. On the other hand, MEPT needs to adjust the body bias voltage in addition. This additional tuning knob makes MEPT much more complicated. This paper proposes an approximate MEPT algorithm, which reduces the complexity of identifying an MEP down to that of DVFS. The key idea is to linearly approximate the relationship between the processor frequency, supply voltage, and body bias voltage. Thanks to the approximation, optimal voltages for a specified clock frequency can be derived immediately. We also propose a task scheduling algorithm, which adjusts processor performance to the workload and then provides a soft real-time capability to the system. The operating system stochastically adjusts the average response time of the processor to be equal to a specified deadline. MEPT will be performed as a general task, and its overhead is considered in the calculation of the frequency. The experiments using a fabricated test chip and on-chip sensors show that the proposed algorithm is a maximum of 16 times more energy-efficient than DVFS. Also, the energy loss induced by the approximation is only 3% at most, and the algorithm does not sacrifice the fundamental real-time properties.

  • Neural Network Calculations at the Speed of Light Using Optical Vector-Matrix Multiplication and Optoelectronic Activation

    Naoki HATTORI  Jun SHIOMI  Yutaka MASUDA  Tohru ISHIHARA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/17
      Vol:
    E104-A No:11
      Page(s):
    1477-1487

    With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.

  • A Synthesis Method Based on Multi-Stage Optimization for Power-Efficient Integrated Optical Logic Circuits

    Ryosuke MATSUO  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  Akihiko SHINYA  Masaya NOTOMI  

     
    PAPER

      Pubricized:
    2021/05/18
      Vol:
    E104-A No:11
      Page(s):
    1546-1554

    Optical logic circuits based on integrated nanophotonics attract significant interest due to their ultra-high-speed operation. However, the power dissipation of conventional optical logic circuits is exponential to the number of inputs of target logic functions. This paper proposes a synthesis method reducing power dissipation to a polynomial order of the number of inputs while exploiting the high-speed nature. Our method divides the target logic function into multiple sub-functions with Optical-to-Electrical (OE) converters. Each sub-function has a smaller number of inputs than that of the original function, which enables to exponentially reduce the power dissipated by an optical logic circuit representing the sub-function. The proposed synthesis method can mitigate the OE converter delay overhead by parallelizing sub-functions. We apply the proposed synthesis method to the ISCAS'85 benchmark circuits. The power consumption of the conventional circuits based on the Binary Decision Diagram (BDD) is at least three orders of magnitude larger than that of the optical logic circuits synthesized by the proposed method. The proposed method reduces the power consumption to about 100mW. The delay of almost all the circuits synthesized by the proposed method is kept less than four times the delay of the conventional BDD-based circuit.

  • Supply and Threshold Voltage Scaling for Minimum Energy Operation over a Wide Operating Performance Region

    Shoya SONODA  Jun SHIOMI  Hidetoshi ONODERA  

     
    PAPER

      Pubricized:
    2021/05/14
      Vol:
    E104-A No:11
      Page(s):
    1566-1576

    A method for runtime energy optimization based on the supply voltage (Vdd) and the threshold voltage (Vth) scaling is proposed. This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). The MEP dynamically fluctuates depending on the operating conditions determined by a target delay constraint, an activity factor and a chip temperature. In order to track the MEP, this paper proposes a closed-form continuous function that determines the MEP over a wide operating performance region ranging from the above-threshold region down to the sub-threshold region. Based on the MEP determination formula, an MEP tracking algorithm is also proposed. The MEP tracking algorithm estimates the MEP even though the operating conditions widely change. Measurement results based on a 32-bit RISC processor fabricated in a 65-nm Silicon On Thin Buried oxide (SOTB) process technology show that the proposed method estimates the MEP within a 5% energy loss in comparison with the actual MEP operation.

  • A Necessary and Sufficient Condition of Supply and Threshold Voltages in CMOS Circuits for Minimum Energy Point Operation

    Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2764-2775

    Scaling supply voltage (VDD) and threshold voltage (Vth) dynamically has a strong impact on energy efficiency of CMOS LSI circuits. Techniques for optimizing VDD and Vth simultaneously under dynamic workloads are thus widely investigated over the past 15 years. In this paper, we refer to the optimum pair of VDD and Vth, which minimizes the energy consumption of a circuit under a specific performance constraint, as a minimum energy point (MEP). Based on the simple transregional models of a CMOS circuit, this paper derives a simple necessary and sufficient condition for the MEP operation. The simple condition helps find the MEP of CMOS circuits. Measurement results using standard-cell based memories (SCMs) fabricated in a 65-nm process technology also validate the condition derived in this paper.

  • A Design Method of a Cell-Based Amplifier for Body Bias Generation

    Takuya KOYANAGI  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E102-C No:7
      Page(s):
    565-572

    Body bias generators are useful circuits that can reduce variability and power dissipation in LSI circuits. However, the amplifier implemented into the body bias generator is difficult to design because of its complexity. To overcome the difficulty, this paper proposes a clearer cell-based design method of the amplifier than the existing cell-based design methods. The proposed method is based on a simple analytical model, which enables to easily design the amplifiers under various operating conditions. First, we introduce a small signal equivalent circuit of two-stage amplifiers by which we approximate a three-stage amplifier, and introduce a method for determining its design parameters based on the analytical model. Second, we propose a method of tuning parameters such as cell-based phase compensation elements and drive-strength of the output stage. Finally, based on the test chip measurement, we show the advantage of the body bias generator we designed in a cell-based flow over existing designs.

  • Analytical Stability Modeling for CMOS Latches in Low Voltage Operation

    Tatsuya KAMAKARI  Jun SHIOMI  Tohru ISHIHARA  Hidetoshi ONODERA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2463-2472

    In synchronous LSI circuits, memory subsystems such as Flip-Flops and SRAMs are essential components and latches are the base elements of the common memory logics. In this paper, a stability analysis method for latches operating in a low voltage region is proposed. The butterfly curve of latches is a key for analyzing a retention failure of latches. This paper discusses a modeling method for retention stability and derives an analytical stability model for latches. The minimum supply voltage where the latches can operate with a certain yield can be accurately derived by a simple calculation using the proposed model. Monte-Carlo simulation targeting 65nm and 28nm process technology models demonstrates the accuracy and the validity of the proposed method. Measurement results obtained by a test chip fabricated in a 65nm process technology also demonstrate the validity. Based on the model, this paper shows some strategies for variation tolerant design of latches.

  • Approximation-Based System Implementation for Real-Time Minimum Energy Point Tracking over a Wide Operating Performance Region

    Shoya SONODA  Jun SHIOMI  Hidetoshi ONODERA  

     
    PAPER

      Pubricized:
    2022/10/07
      Vol:
    E106-A No:3
      Page(s):
    542-550

    This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). This paper proposes an approximation-based implementation method for an MEP tracking system over a wide voltage region. This paper focuses on the MEP characteristics that the energy loss is sufficiently small even though the voltage point changes near the MEP. For example, the energy loss is less than 5% even though the estimated MEP differs by a few tens of millivolts in comparison with the actual MEP. Therefore, the complexity for determining the MEP is relaxed by approximating complex operations such as the logarithmic or the exponential functions in the MEP tracking algorithm, which leads to hardware-/software-efficient implementation. When the MEP tracking algorithm is implemented in software, the MEP estimation time is reduced from 1ms to 13µs by the proposed approximation. When implemented in hardware, the proposed method can reduce the area of an MEP estimation circuit to a quarter. Measurement results of a 32-bit RISC-V processor fabricated in a 65-nm SOTB process technology show that the energy loss introduced by the proposed approximation is less than 2% in comparison with the MEP operation. Furthermore, we show that the MEP can be tracked within about 45 microseconds by the proposed MEP tracking system.

  • Nonvolatile Storage Cells Using FiCC for IoT Processors with Intermittent Operations

    Yuki ABE  Kazutoshi KOBAYASHI  Jun SHIOMI  Hiroyuki OCHI  

     
    PAPER

      Pubricized:
    2023/04/13
      Vol:
    E106-C No:10
      Page(s):
    546-555

    Energy harvesting has been widely investigated as a potential solution to supply power for Internet of Things (IoT) devices. Computing devices must operate intermittently rather than continuously, because harvested energy is unstable and some of IoT applications can be periodic. Therefore, processors for IoT devices with intermittent operation must feature a hibernation mode with zero-standby-power in addition to energy-efficient normal mode. In this paper, we describe the layout design and measurement results of a nonvolatile standard cell memory (NV-SCM) and nonvolatile flip-flops (NV-FF) with a nonvolatile memory using Fishbone-in-Cage Capacitor (FiCC) suitable for IoT processors with intermittent operations. They can be fabricated in any conventional CMOS process without any additional mask. NV-SCM and NV-FF are fabricated in a 180nm CMOS process technology. The area overhead by nonvolatility of a bit cell are 74% in NV-SCM and 29% in NV-FF, respectively. We confirmed full functionality of the NV-SCM and NV-FF. The nonvolatile system using proposed NV-SCM and NV-FF can reduce the energy consumption by 24.3% compared to the volatile system when hibernation/normal operation time ratio is 500 as shown in the simulation.