The search functionality is under construction.

Author Search Result

[Author] Hiroshi NAKAMURA(24hit)

1-20hit(24hit)

  • Evaluation of a New Power-Gating Scheme Utilizing Data Retentiveness on Caches

    Kyundong KIM  Seidai TAKEDA  Shinobu MIWA  Hiroshi NAKAMURA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E95-A No:12
      Page(s):
    2301-2308

    Caches are one of the most leakage consuming components in modern processor because of massive amount of transistors. To reduce leakage power of caches, several techniques using power-gating (PG) were proposed. Despite of its high leakage saving, a side effect of PG for caches is the loss of data during a sleep. If useful data is lost in sleep mode, it should be fetched again from a lower level memory. This consumes a considerable amount of energy, which very unfortunately mitigates the leakage saving. This paper proposes a new PG scheme considering data retentiveness of SRAM. After entering the sleep mode, data of an SRAM cell is not lost immediately and is usable by checking the validity of the data. Therefore, we utilize data retentiveness of SRAM to avoid energy overhead for data recovery, which results in further chance of leakage saving. To check availability, we introduce a simple hardware whose overhead is ignorable. Our experimental result shows that utilizing data retentiveness saves up to 32.42% of more leakage than conventional PG.

  • A Fine-Grained Power Gating Control on Linux Monitoring Power Consumption of Processor Functional Units

    Atsushi KOSHIBA  Motoki WADA  Ryuichi SAKAMOTO  Mikiko SATO  Tsubasa KOSAKA  Kimiyoshi USAMI  Hideharu AMANO  Masaaki KONDO  Hiroshi NAKAMURA  Mitaro NAMIKI  

     
    PAPER

      Vol:
    E98-C No:7
      Page(s):
    559-568

    The authors have been researching on reducing the power consumption of microprocessors, and developed a low-power processor called “Geyser” by applying power gating (PG) function to the individual functional units of the processor. PG function on Geyser reduces the power consumption of functional units by shutting off the power voltage of idle units. However, the energy overhead of switching the supply voltage for units on and off causes power increases. The amount of the energy overhead varies with the behavior of each functional unit which is influenced by running application, and also with the core temperature. It is therefore necessary to switch the PG function itself on or off according to the state of the processor at runtime to reduce power consumption more effectively. In this paper, the authors propose a PG control method to take the power overhead into account by the operating system (OS). In the proposed method, for achieving much power reduction, the OS calculates the power consumption of each functional unit periodically and inhibits the PG function of the unit whose energy overhead is judged too high. The method was implemented in the Linux process scheduler and evaluated. The results show that the average power consumption of the functional units is reduced by up to 17.2%.

  • Fine-Grained Run-Tume Power Gating through Co-optimization of Circuit, Architecture, and System Software Design Open Access

    Hiroshi NAKAMURA  Weihan WANG  Yuya OHTA  Kimiyoshi USAMI  Hideharu AMANO  Masaaki KONDO  Mitaro NAMIKI  

     
    INVITED PAPER

      Vol:
    E96-C No:4
      Page(s):
    404-412

    Power consumption has recently emerged as a first class design constraint in system LSI designs. Specially, leakage power has occupied a large part of the total power consumption. Therefore, reduction of leakage power is indispensable for efficient design of high-performance system LSIs. Since 2006, we have carried out a research project called “Innovative Power Control for Ultra Low-Power and High-Performance System LSIs”, supported by Japan Science and Technology Agency as a CREST research program. One of the major objectives of this project is reducing the leakage power consumption of system LSIs by innovative power control through tight cooperation and co-optimization of circuit technology, architecture, and system software designs. In this project, we focused on power gating as a circuit technique for reducing leakage power. Temporal granularity is one of the most important issue in power gating. Thus, we have developed a series of Geysers as proof-of-concept CPUs which provide several mechanisms of fine-grained run-time power gating. In this paper, we describe their concept and design, and explain why co-optimization of different design layers are important. Then, three kinds of power gating implementations and their evaluation are presented from the view point of power saving and temporal granularity.

  • An Energy-Efficient Task Scheduling for Near-Realtime Systems with Execution Time Variation

    Takashi NAKADA  Tomoki HATANAKA  Hiroshi UEKI  Masanori HAYASHIKOSHI  Toru SHIMIZU  Hiroshi NAKAMURA  

     
    PAPER-Software System

      Pubricized:
    2017/06/26
      Vol:
    E100-D No:10
      Page(s):
    2493-2504

    Improving energy efficiency is critical for embedded systems in our rapidly evolving information society. Near real-time data processing tasks, such as multimedia streaming applications, exhibit a common fact that their deadline periods are longer than their input intervals due to buffering. In general, executing tasks at lower performance is more energy efficient. On the other hand, higher performance is necessary for huge tasks to meet their deadlines. To minimize the energy consumption while meeting deadlines strictly, adaptive task scheduling including dynamic performance mode selection is very important. In this work, we propose an energy efficient slack-based task scheduling algorithm for such tasks by adapting to task size variations and applying DVFS with the help of statistical analysis. We confirmed that our proposal can further reduce the energy consumption when compared to oracle frame-based scheduling.

  • FOREWORD

    Kenkichi HIRADE  Hiroshi SUZUKI  Hideichi SASAOKA  Hiroshi NAKAMURA  Yukitsuna FURUYA  

     
    FOREWORD

      Vol:
    E77-B No:5
      Page(s):
    533-534
  • Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

    Siyi HU  Makiko ITO  Takahide YOSHIKAWA  Yuan HE  Hiroshi NAKAMURA  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2023/07/20
      Vol:
    E106-D No:12
      Page(s):
    2015-2025

    Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.

  • Mobile Service Control Point for Intelligent and Multimedia Mobile Communications

    Hiroshi NAKAMURA  Kenichi KIMURA  Akihisa NAKAJIMA  

     
    PAPER

      Vol:
    E77-B No:9
      Page(s):
    1089-1095

    To provide personal, intelligent, and multimedia services through a mobile communications network, a Mobile Service Control Point (M-SCP) was developed, which performs both the location register and service control functions. The M-SCP was constructed on a common platform to allow quick introduction of new services. Software techniques to reduce the frequency of process-switching, assign the highest priority to real-time tasks, and operate a multiple-CPU structure provide faster real-time processing. This is confirmed by computer simulation and research in the field.

  • An Operating System Guided Fine-Grained Power Gating Control Based on Runtime Characteristics of Applications

    Atsushi KOSHIBA  Mikiko SATO  Kimiyoshi USAMI  Hideharu AMANO  Ryuichi SAKAMOTO  Masaaki KONDO  Hiroshi NAKAMURA  Mitaro NAMIKI  

     
    PAPER

      Vol:
    E99-C No:8
      Page(s):
    926-935

    Fine-grained power gating (FGPG) is a power-saving technique by switching off circuit blocks while the blocks are idle. Although FGPG can reduce power consumption without compromising computational performance, switching the power supply on and off causes energy overhead. To prevent power increase caused by the energy overhead, in our prior research we proposed an FGPG control method of the operating system(OS) based on pre-analyzing applications' power usage. However, modern computing systems have a wide variety of use cases and run many types of application; this makes it difficult to analyze the behavior of all these applications in advance. This paper therefore proposes a new FGPG control method without profiling application programs in advance. In the new proposed method, the OS monitors a circuit's idle interval periodically while application programs are running. The OS enables FGPG only if the interval time is long enough to reduce the power consumption. The experimental results in this paper show that the proposed method reduces power consumption by 9.8% on average and up to 17.2% at 25°C. The results also show that the proposed method achieves almost the same power-saving efficiency as the previous profile-based method.

  • A New 90 MBPS 68 APSK Modem with Honeycomb Constellation for Digital Radio Relay Systems

    Hiroshi NAKAMURA  Noboru IIZUKA  Eisuke FUKUDA  Morihiko MINOWA  Yoshimasa DAIDO  Sadao TAKENAKA  

     
    PAPER-Radio Communication

      Vol:
    E71-E No:6
      Page(s):
    591-599

    This paper describes the first realization of the APSK system with a honeycomb constellation (HC) for high capacity digital radio links. Partial Gray coding method to improve receiver sensitivity is also described. It is shown that the 68 APSK modulation can increase receiver sensitivity by 0.7 dB compared with a 64 QAM. Possibility of the system with the HC using the present state of the art is confirmed by theoretical estimation of tolerances for modem impairments. Techniques necessary to realize a system with the HC are also described. Using the techniques, the 68 APSK modem with the HC was experimentally fabricated. The modem we fabricated has a transmission capacity of 90 Mbps within the FCC-authorized bandwidth of 20 MHz in the 4 GHz band. The BER performance and signature for multipath fading have been measured. It is confirmed that the signature of the 68 APSK is almost the same as that of the conventional 64 QAM. The signature improved using 5 tap transversal equalizer correspons to an outage of 2 seconds per year per hop.

  • Reducing Memory System Energy by Software-Controlled On-Chip Memory

    Masaaki KONDO  Hiroshi NAKAMURA  

     
    PAPER-Architecture and Algorithms

      Vol:
    E86-C No:4
      Page(s):
    580-588

    In recent computer systems, a large portion of energy is consumed by on-chip cache accesses and data movement between cache and off-chip main memory. Reducing these memory system energy is indispensable for future microprocessors because power and thermal issues certainly become a key factor of limiting processor performance. In this paper, we discuss and evaluate how our architecture called SCIMA contributes to energy saving. SCIMA integrates software-controllable memory (SCM) into processor chip. SCIMA can save total memory system energy by using SCM under the support of compiler. The evaluation results reveal that SCIMA can reduce 5-50% of memory system energy and still faster than conventional cache based architecture.

  • A Double-Leve1-Vth Select Gate Array Architecture for Multilevel NAND Flash Memories

    Ken TAKEUCHI  Tomoharu TANAKA  Hiroshi NAKAMURA  

     
    PAPER-Memory

      Vol:
    E79-C No:7
      Page(s):
    1013-1020

    In multilevel flash memorles, the threshold voltages of the memory cells should be controlled precisely. This paper describes how in a conventional NAND flash memory, the threshold voltages of the memory cells fluctuate due to array noise during the bit-by-bit program verify operation, and as a result, the threshold voltage distribution becomes wider. This paper describes a new array architecture, "A double-level-Vth select gate array architecture" to eliminate the array noise, together with a reduction of the cell area. The array noise is mainly caused by interbitline capacitive coupling noise and by the high resistance of the diffused source-line. The threshold voltage fluctuation can be as much as 0.7 V in a conventional array. In the proposed array, bitlines are alternately selected, and the unselected bitlines are used as low resistance source-lines. Moreover, the unselected bitlines form a shield between the neighboring selected bitlines. As a result, the array noise is strongly suppressed. The threshold voltage fluctuation is estimated to be as small as 0.03 V in the proposed array and a reliable operation of a multilevel NAND flash memory can be realized.

  • Synthesis of Serial Local Clock Controllers for Asynchronous Circuit Design

    Nattha SRETASEREEKUL  Hiroshi SAITO  Euiseok KIM  Metehan OZCAN  Masashi IMAI  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-IP Design

      Vol:
    E86-A No:12
      Page(s):
    3028-3037

    Asynchronous controllers effectively control high concurrence of datapath operations for high speed. Signal Transition Graphs (STGs) can effectively represent these concurrent events. However, highly concurrent STGs cause the state explosion problem in asynchronous synthesis tools. Many small but highly concurrent STGs cannot be synthesized to obtain control circuits. Moreover, STGs also lead to some control-time overhead of the four-phase handshake protocol. In this paper, we propose a method for deriving the serial control nodes from Control Data Flow Graphs (CDFGs) such that the concurrence of datapath operations is still preserved. The STGs derived from the serialized control nodes are serial STGs which are simpler for synthesis than the concurrent STGs. We also propose an implementation using these serialized controllers to generate local clocks at any necessary times. The implementation results in very small control-time overhead. The experimental results show that the number of synthesis states is proportional to the number of control signals, and the circuits with satisfiable small control-time overhead are obtained.

  • An Energy-Efficient Task Scheduling for Near Real-Time Systems on Heterogeneous Multicore Processors

    Takashi NAKADA  Hiroyuki YANAGIHASHI  Kunimaro IMAI  Hiroshi UEKI  Takashi TSUCHIYA  Masanori HAYASHIKOSHI  Hiroshi NAKAMURA  

     
    PAPER-Software System

      Pubricized:
    2019/11/01
      Vol:
    E103-D No:2
      Page(s):
    329-338

    Near real-time periodic tasks, which are popular in multimedia streaming applications, have deadline periods that are longer than the input intervals thanks to buffering. For such applications, the conventional frame-based schedulings cannot realize the optimal scheduling due to their shortsighted deadline assumptions. To realize globally energy-efficient executions of these applications, we propose a novel task scheduling algorithm, which takes advantage of the long deadline period. We confirm our approach can take advantage of the longer deadline period and reduce the average power consumption by up to 18%.

  • Power Penalty of Multilevel QAM Modem Caused by Two Simultaneously Existing lmpairments

    Yoshimasa DAIDO  Sadao TAKENAKA  Hiroshi NAKAMURA  

     
    PAPER-Radio Wave and Satellite Communication

      Vol:
    E70-E No:7
      Page(s):
    628-633

    Since power penalty of multilevel QAM system is caused by many simultaneously existing impairments, it is important to know whether the sum of the penalties caused by each impairment coincides with the power penalty caused by the actual channel condition or not. To examine the combined effect caused by plural impairments, theoretical power penalty is estimated in detail when any two impairments among the typical seven exist simultaneously. The excess penalty is defined as deviation of the power penalty from the sum of penalties caused by each impairment. Calculation of the excess penalty shows that there are four categories for combinations of impairments. It is shown that there is a typical category of combinations which includes more than half of all possible combinations. Calculated excess penalties are very close for all combinations within this category. A simple algebraic equation is given to approximate the excess penalty for the category. The excess penalties of other categories will be shown and their characteristics will be discussed in detail.

  • Power Efficient High-Level Modulation for High-Capacity Digital Radio Systems

    Hiroshi NAKAMURA  Yoshimasa DAIDO  

     
    PAPER-Radio Communication

      Vol:
    E72-E No:5
      Page(s):
    633-640

    This paper describes theoretical estimation of power efficiency improvement by adopting amplitude and phase modulation (APSK) with a honeycomb constellation (HC) instead of multilevel QAM modulation. Nonlinear distortion caused by a power amplifier is considered in the estimation, and nonlinearity of the amplifier is approximated by third and fifth order nonlinearity. To eliminated the difficulty in carrier reconstruction, a pilot carrier injection method is assumed for the APSK with the HC. However injecting the carrier reduces the power efficiency improvement, so dependence of power efficiency on the injected carrier level is estimated, theoretically. SNR of the recovered carrier as a function of the pilot carrier level is also estimated, experimentally. From these two estimations, an optimum pilot carrier level is determined for a 64 APSK system wit the HC. The possibility of reducing maximum available power of the amplifier by 2.0 dB is confirmed at the optimum pilot carrier level that corresponds to an offset of 1/4 data space in constellation. At the optimized level, SNR of the recovered carrier is 40 dB, which guarantees satisfactory operation of the system.

  • Area-Efficient Microarchitecture for Reinforcement of Turbo Mode

    Shinobu MIWA  Takara INOUE  Hiroshi NAKAMURA  

     
    PAPER-Computer System

      Vol:
    E97-D No:5
      Page(s):
    1196-1210

    Turbo mode, which accelerates many applications without major change of existing systems, is widely used in commercial processors. Since time duration or powerfulness of turbo mode depends on peak temperature of a processor chip, reducing the peak temperature can reinforce turbo mode. This paper presents that adding small amount of hardware allows microprocessors to reduce the peak temperature drastically and then to reinforce turbo mode successfully. Our approach is to find out a few small units that become heat sources in a processor and to appropriately duplicate them for reduction of their power density. By duplicating the limited units and using the copies evenly, the processor can show significant performance improvement while achieving area-efficiency. The experimental result shows that the proposed method achieves up to 14.5% of performance improvement in exchange for 2.8% of area increase.

  • Design Method of High Performance and Low Power Functional Units Considering Delay Variations

    Kouichi WATANABE  Masashi IMAI  Masaaki KONDO  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-Circuit Synthesis

      Vol:
    E89-A No:12
      Page(s):
    3519-3528

    As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.

  • A 256 QAM Digital Radio System with a Low Rolloff Factor of 20% for Attaining 6.75 bps/Hz

    Hiroshi NAKAMURA  Eisuke FUKUDA  Noburu IIZUKA  Yoshimasa DAIDO  Sadao TAKENAKA  

     
    PAPER-Radio Communication

      Vol:
    E71-E No:1
      Page(s):
    43-50

    This paper describes a newly-developed 4 GHz 135 Mbps 256 QAM system with a rolloff factor of 20%, which can attain a spectrum efficiency of 6.75 bps/Hz. The key techniques are theoretically investigated to realize this system. It was predicted theoretically that the simultaneous incorporation of 7-tap transversal equalizers (TEQL) and a recursive slope equalizer (SEQL) would be required as countermeasure for multipath fading. The 256 QAM system was designed considering the results of the theoretical investigation. Excellent BER performance was obtained with the aid of forward error correction and pilot carrier injection. Since remarkable improvement in the signature was obtained by the simultaneous user of TEQL and SEQL, the 256 QAM system with a very low rolloff factor is promising.

  • A Runtime Optimization Selection Framework to Realize Energy Efficient Networks-on-Chip

    Yuan HE  Masaaki KONDO  Takashi NAKADA  Hiroshi SASAKI  Shinobu MIWA  Hiroshi NAKAMURA  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2881-2890

    Networks-on-Chip (or NoCs, for short) play important roles in modern and future multi-core processors as they are highly related to both performance and power consumption of the entire chip. Up to date, many optimization techniques have been developed to improve NoC's bandwidth, latency and power consumption. But a clear answer to how energy efficiency is affected with these optimization techniques is yet to be found since each of these optimization techniques comes with its own benefits and overheads while there are also too many of them. Thus, here comes the problem of when and how such optimization techniques should be applied. In order to solve this problem, we build a runtime framework to throttle these optimization techniques based on concise performance and energy models. With the help of this framework, we can successfully establish adaptive selections over multiple optimization techniques to further improve performance or energy efficiency of the network at runtime.

  • Sleep Transistor Sizing Method Using Accurate Delay Estimation Considering Input Vector Pattern and Non-linear Current Model

    Seidai TAKEDA  Kyundong KIM  Hiroshi NAKAMURA  Kimiyoshi USAMI  

     
    PAPER-Physical Level Design

      Vol:
    E94-A No:12
      Page(s):
    2499-2509

    Beyond deep sub-micron era, Power Gating (PG) is one of the most effective techniques to reduce leakage power of circuits. The most important issue of PG circuit design is how to decide the width of sleep transistor. Smaller total sleep transistor width provides smaller leakage power in standby mode, however, insufficient sleep transistor insertion suffers from significant performance degradation. In this paper, we present an accurate and fast gate-level delay estimation method for PG circuits and a novel sleep transistor sizing method utilizing our delay estimation for module-based PG circuits. This method achieves high accuracy within acceptable computation time utilizing accurate discharge current estimation based on delayed logic simulations with limited input vector patterns and by realizing precise current characteristics for logic gates and sleep transistors. Experimental results show that our delay estimation successfully achieves high accuracy and avoids overestimation and underestimation seen in conventional method. Also, our sleep transistor sizing method on average successfully reduces the width of sleep transistors by 40% when compared to conventional methods within an acceptable computation time.

1-20hit(24hit)