Toshihiro OZAKI Tetsuya HIROSE Takahiro NAGAI Keishi TSUBAKI Nobutaka KUROKI Masahiro NUMA
This paper presents a fully integrated voltage boost converter consisting of a charge pump (CP) and maximum power point tracking (MPPT) controller for ultra-low power energy harvesting. The converter is based on a conventional CP circuit and can deliver a wide range of load current by using nMOS and pMOS driver circuits for highly efficient charge transfer operation. The MPPT controller we propose dissipates nano-watt power to extract maximum power regardless of the harvester's power generation conditions and load current. The measurement results demonstrated that the circuit converted a 0.49-V input to a 1.46-V output with 73% power conversion efficiency when the output power was 348µW. The circuit can operate at an extremely low input voltage of 0.21V.
Heming SUN Dajiang ZHOU Shuping ZHANG Shinji KIMURA
In this paper, we present a low-power system for the de-quantization and inverse transform of HEVC. Firstly, we present a low-delay circuit to process the coded results of the syntax elements, and then reduce the number of multipliers from 16 to 4 for the de-quantization process of each 4x4 block. Secondly, we give two efficient data mapping schemes for the memory between de-quantization and inverse transform, and the memory for transpose. Thirdly, the zero information is utilized through the whole system. For two memory parts, the write and read operation of zero blocks/ rows/ coefficients can all be skipped to save the power consumption. The results show that up to 86% power consumption can be saved for the memory part under the configuration of “Random-access” and common QPs. For the logical part, the proposed architecture for de-quantization can reduce 77% area consumption. Overall, our system can support real-time coding for 8K x 4K 120fps video sequences and the normalized area consumption can be reduced by 68% compared with the latest work.
Yoshihiro MASUI Kotaro WADA Akihiro TOYA Masaki TANIOKA
We propose a low-noise and low-power dynamic comparator with an offset calibration circuit for Low-Power ADCs. The proposed comparator equips the control circuit in order to switching the comparison accuracy and the current consumption. When high accuracy is not required, current consumption is reduced by allowing the noise increase. Compared with a traditional dynamic comparator, the proposed architecture reduced the current consumption to 78% at 100MHz operating and 1.8V supply voltage. Furthermore, the offset voltage is corrected with minimal current consumption by controlling the on/off operation of the offset calibration circuit.
Kenichiro YASHIKI Toshinori UEMURA Mitsuru KURIHARA Yasuyuki SUZUKI Masatoshi TOKUSHIMA Yasuhiko HAGIHARA Kazuhiko KURATA
Aiming to solve the input/output (I/O) bottleneck concerning next-generation interconnections, 5×5-millimeters-squared silicon-photonics-based chip-scale optical transmitters/receivers (TXs/RXs) — called “optical I/O cores” — were developed. In addition to having a compact footprint, by employing low-power-consumption integrated circuits (ICs), as well as providing multimode-fiber (MMF) transmission in the O band and a user-friendly interface, the developed optical I/O cores allow common ease of use with applications such as multi-chip modules (MCMs) and active optical cables (AOCs). The power consumption of their hybrid-integrated ICs is 5mW/Gbps. Their high-density user-friendly optical interface has a spot-size-converter (SSC) function and permits the physical contact against the outer waveguides. As a result, they provide large enough misalignment tolerance to allow use of passive alignment and visual alignment. In a performance test, they demonstrated 25-Gbps/ch error-free operation over 300-m MMF.
Yuzuru SHIZUKU Tetsuya HIROSE Nobutaka KUROKI Masahiro NUMA Mitsuji OKADA
In this paper, we propose a low-power circuit-shared static flip-flop (CS2FF) for extremely low power digital VLSIs. The CS2FF consists of five static NORs and two inverters (INVs). The CS2FF utilizes a positive edge of a buffered clock signal, which is generated from a root clock, to take data into a master latch and a negative edge of the root clock to hold the data in a slave latch. The total number of transistors is only 24, which is the same as the conventional transmission-gate flip flop (TGFF) used in the most standard cell libraries. SPICE simulations in 0.18-µm standard CMOS process demonstrated that our proposed CS2FF achieved clock-to-Q delay of 18.3ns, setup time of 10.0ns, hold time of 5.5ns, and power dissipation of 9.7nW at 1-MHz clock frequency and 0.5-V power supply. The physical design area increased by 16% and power dissipation was reduced by 13% compared with those of conventional TGFF. Measurement results demonstrated that our proposed CS2FF can operate at 0.352V with extremely low energy of 5.93fJ.
RXv2 is the new generation of Renesas's processor architecture for microcontrollers with high-capacity flash memory. An enhanced instruction set and pipeline structure with an advanced fetch unit (AFU) provide an effective balance between power consumption performance and high processing performance. Enhanced instructions such as DSP function and floating point operation and a five-stage dual-issue pipeline synergistically boost the performance of digital signal applications. The RXv2 processor delivers 1.9 - 3.7 the cycle performance of the RXv1 in these applications. The decrease of the number of Flash memory accesses by AFU is a dominant determiner of reducing power consumption. AFU of RXv2 benefits from adopting branch target cache, which has a comparatively smaller area than that of typical cache systems. High code density delivers low power consumption by reducing instruction memory bandwidth. The implementation of RXv2 delivers up to 46% reduction in static code size, up to 30% reduction in dynamic code size relative to RISC architectures. RXv2 reaches 4.0 Coremark per MHz and operates up to 240MHz. The RXv2 processor delivers approximately more than 2.2 - 5.7x the power efficiency of the RXv1. The RXv2 microprocessor achieves the best possible computing performance in various applications such as building automation, medical, motor control, e-metering, and home appliances which lead to the higher memory capacity, frequency and processing performance.
Gong CHEN Yu ZHANG Qing DONG Ming-Yu LI Shigetoshi NAKATAKE
As semiconductor manufacturing processing scaling down, leakage current of CMOS circuits is becoming a dominant contributor to power dissipation. This paper provides an efficient leakage current reduction (LCR) technique for low-power and low-frequency circuit designs in terms of design rules and layout parameters related to layout dependent effects. We address the LCR technique both for analog and digital circuits, and present a design case when applying the LCR techniqe to a successive-approximation-register (SAR) analog-to-digital converter (ADC), which typically employs analog and digital transistors. In the post-layout simulation results by HSPICE, an SAR-ADC with the LCR technique achieves 38.6-nW as the total power consumption. Comparing with the design without the LCR technique, we attain about 30% total energy reduction.
Keishi TSUBAKI Tetsuya HIROSE Nobutaka KUROKI Masahiro NUMA
This paper proposes an ultra-low power fully on-chip CMOS relaxation oscillator (ROSC) for a real-time clock application. The proposed ROSC employs a compensation circuit of a comparator's non-idealities caused by offset voltage and delay time. The ROSC can generate a stable, and 32-kHz oscillation clock frequency without increasing power dissipation by using a low reference voltage and employing a novel compensation architecture for comparators. Measurement results in a 0.18-$mu$m CMOS process demonstrated that the circuit can generate a stable clock frequency of 32.55,kHz with low power dissipation of 472,nW at 1.8-V power supply. Measured line regulation and temperature coefficient were 1.1%/V and 120,ppm/$^{circ}$C, respectively.
Juha PETÄJÄJÄRVI Heikki KARVONEN Konstantin MIKHAYLOV Aarno PÄRSSINEN Matti HÄMÄLÄINEN Jari IINATTI
This paper discusses the perspectives of using a wake-up receiver (WUR) in wireless body area network (WBAN) applications with event-driven data transfers. First we compare energy efficiency between the WUR-based and the duty-cycled medium access control protocol -based IEEE 802.15.6 compliant WBAN. Then, we review the architectures of state-of-the-art WURs and discuss their suitability for WBANs. The presented results clearly show that the radio frequency envelope detection based architecture features the lowest power consumption at a cost of sensitivity. The other architectures are capable of providing better sensitivity, but consume more power. Finally, we propose the design modification that enables using a WUR to receive the control commands beside the wake-up signals. The presented results reveal that use of this feature does not require complex modifications of the current architectures, but enables to improve energy efficiency and latency for small data blocks transfers.
Masanori FURUTA Hidenori OKUNI Masahiro HOSOYA Akihide SAI Junya MATSUNO Shigehito SAIGUSA Tetsuro ITAKURA
This paper presents an analog front-end circuit for a 60-GHz proximity wireless communication receiver. The feature of the proposed analog front-end circuit is a bandwidth more than 1-GHz wide. To expand the bandwidth of a low-pass filter and a voltage gain amplifier, a technique to reduce the parasitic capacitance of a transconductance amplifier is proposed. Since the bandwidth is also limited by on-resistance of the ADC sampling switch, a switch separation technique for reduction of the on-resistance is also proposed. In a high-speed ADC, the SNDR is limited by the sampling jitter. The developed high resolution VCO auto tuning effectively reduces the jitter of PLL. The prototype is fabricated in 65nm CMOS. The analog front-end circuit achieves over 1-GHz bandwidth and 27.2-dB SNDR with 224mW Power consumption.
Keishi TSUBAKI Tetsuya HIROSE Yuji OSAKI Seiichiro SHIGA Nobutaka KUROKI Masahiro NUMA
A fully on-chip CMOS relaxation oscillator (ROSC) with a PVT variation compensation circuit is proposed in this paper. The circuit is based on a conventional ROSC and has a distinctive feature in the compensation circuit that compensates for comparator's non-idealities caused by not only offset voltage, but also delay time. Measurement results demonstrated that the circuit can generate a stable clock frequency of 6.66kHz. The current dissipation was 320nA at 1.0-V power supply. The measured line regulation and temperature coefficient were 0.98%/V and 56ppm/°C, respectively.
Masamitsu TANAKA Atsushi KITAYAMA Masakazu OKADA Tomohito KOUKETSU Takumi TAKINAMI Masato ITO Akira FUJIMAKI
We report the successful operation of a low-power arithmetic logic unit (ALU) based on a low-voltage rapid single-flux-quantum (LV-RSFQ) logic circuit, whereby a dc bias current is fed to circuits from lowered constant-voltage sources through small resistors. Both the static and dynamic energy consumptions are reduced because of the reduction in the amplitudes of voltage pulses across the Josephson junctions, with a trade-off of slightly slower switching speeds. The designed bias voltage was set to 0.25mV, which is one-tenth that of our standard RSFQ circuit design. We investigated several issues related to such low-voltage operation, including margins and timing design. To achieve successful operation, we tuned the circuit parameters in the logic gate design and carefully controlled the timing by considering the interference of pulse signals. We show test results for the low-voltage ALU in on-chip high-speed testing. The circuit was fabricated using the AIST Nb/AlOx/Nb Advanced Process with a critical current density of 10kA/cm2. We verified that arithmetic and logical operations were correctly implemented and obtained dc bias margins of 18% at a target clock frequency of 20GHz and achieved a maximum clock frequency of 28GHz with a power consumption of 28µW. These experimental results indicate energy efficiency of 3.6 times that of the standard RSFQ circuit design.
For battery based real-time embedded systems, high performance to meet their real-time constraints and energy efficiency to extend battery life are both essential. Real-Time Dynamic Voltage Scaling (RT-DVS) has been a key technique to satisfy both requirements. This paper presents EccEDF (Enhanced ccEDF), an efficient algorithm based on ccEDF. ccEDF is one of the most simple but efficient RT-DVS algorithms. Its simple structure enables it to be easily and intuitively coupled with a real-time operating system without incurring any significant cost. ccEDF, however, overlooks an important factor in calculating the available slacks for reducing the operating frequency. It calculates the saved utilization simply by dividing the slack by the period without considering the time needed to run the task. If the elapsed time is considered, the maximum utilization saved by the slack on completion of the task can be found. The proposed EccEDF can precisely calculate the maximum unused utilization with consideration of the elapsed time while keeping the structural simplicity of ccEDF. Further, we analytically establish the feasibility of EccEDF using the fluid scheduling model. Our simulation results show that the proposed algorithm outperforms ccEDF in all simulations. A simulation shows that EccEDF consumes 27% less energy than ccEDF.
Masanori FURUTA Ippei AKITA Junya MATSUNO Tetsuro ITAKURA
This paper presents a 7-bit 1.5-GS/s time-interleaved (TI) SAR ADC. The scheme achieves better isolation between sub-ADCs thanks to embedding a track-and-hold (T/H) amplifier and reference voltage buffer in each sub-ADC. The proposed dynamic T/H circuit enables high-speed, low-power operation. The prototype is fabricated in a 65-nm CMOS technology. The total active area is 0.14,mm2 and the ADC consumes 36 mW from a 1.2-V supply. The measured results show the peak spurious-free dynamic range (SFDR) and signal-to-noise-and-distortion ratio (SNDR) are 52.4 dB and 39.6 dB, respectively, and an figure of Merit (FoM) of 300 fJ/conv. is achieved.
Shan-Chun KUO Hong-Yuan JHENG Fan-Chieh CHENG Shanq-Jang RUAN
In this letter, a design of inverse discrete cosine transform for energy-efficient watermarking mechanism based on DS-CDMA with significant energy and area reduction is presented. Taking advantage of converged input data value set as a precomputation concept, the proposed one-dimensional IDCT is a multiplierless hardware which differs from Loeffler architecture and has benefits of low complexity and low power consumption. The experimental results show that our design can reduce 85.2% energy consumption and 58.6% area. Various spectrum and spatial attacks are also tested to corroborate the robustness.
Kosuke MIZUNO Kenta TAKAGI Yosuke TERACHI Shintaro IZUMI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a Histogram of Oriented Gradients (HOG) feature extraction accelerator that features a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, dual core architecture for parallel feature extraction and multiple object detection, and detection-window-size scalable architecture with reconfigurable MAC array for processing objects of several shapes. To achieve low-power consumption for mobile applications, early classification reduces the amount of computations in SVM classification efficiently with no accuracy degradation. The dual core architecture enables parallel feature extraction in one frame for high-speed or low-power computing and detection of multiple objects simultaneously with low power consumption by HOG feature sharing. Objects of several shapes, a vertically long object, a horizontally long object, and a square object, can be detected because of cooperation between the two cores. The proposed methods provide processing capability for HDTV resolution video (19201080 pixels) at 30 frames per second (fps). The test chip, which has been fabricated using 65 nm CMOS technology, occupies 4.22.1 mm2 containing 502 Kgates and 1.22 Mbit on-chip SRAMs. The simulated data show 99.5 mW power consumption at 42.9 MHz and 1.1 V.
Hiroshi NAKAMURA Weihan WANG Yuya OHTA Kimiyoshi USAMI Hideharu AMANO Masaaki KONDO Mitaro NAMIKI
Power consumption has recently emerged as a first class design constraint in system LSI designs. Specially, leakage power has occupied a large part of the total power consumption. Therefore, reduction of leakage power is indispensable for efficient design of high-performance system LSIs. Since 2006, we have carried out a research project called “Innovative Power Control for Ultra Low-Power and High-Performance System LSIs”, supported by Japan Science and Technology Agency as a CREST research program. One of the major objectives of this project is reducing the leakage power consumption of system LSIs by innovative power control through tight cooperation and co-optimization of circuit technology, architecture, and system software designs. In this project, we focused on power gating as a circuit technique for reducing leakage power. Temporal granularity is one of the most important issue in power gating. Thus, we have developed a series of Geysers as proof-of-concept CPUs which provide several mechanisms of fine-grained run-time power gating. In this paper, we describe their concept and design, and explain why co-optimization of different design layers are important. Then, three kinds of power gating implementations and their evaluation are presented from the view point of power saving and temporal granularity.
Toshihiro KONISHI Keisuke OKUNO Shintaro IZUMI Masahiko YOSHIMOTO Hiroshi KAWAGUCHI
This paper presents a second-order ΔΣ analog-to-digital converter (ADC) operating in a time domain. In the proposed ADC architecture, a voltage-controlled delay unit (VCDU) converts an input analog voltage to a delay time. Then, the clocks outputs from a gated ring oscillator (GRO) are counted during the delay time. No switched capacitor or opamp is used. Therefore, the proposed ADC can be implemented in a small area and with low power. For that reason, it has process scalability: it can keep pace with Moore's law. A time error is propagated to the second GRO by a multi-stage noise-shaping (MASH) topology, which provides second-order noise-shaping. In a standard 40-nm CMOS process, a SNDR of 45 dB is achievable at input bandwidth of 16 kHz and a sampling rate of 8 MHz, where the power is 408.5 µW. Its area is 608 µm2.
Kyundong KIM Seidai TAKEDA Shinobu MIWA Hiroshi NAKAMURA
Caches are one of the most leakage consuming components in modern processor because of massive amount of transistors. To reduce leakage power of caches, several techniques using power-gating (PG) were proposed. Despite of its high leakage saving, a side effect of PG for caches is the loss of data during a sleep. If useful data is lost in sleep mode, it should be fetched again from a lower level memory. This consumes a considerable amount of energy, which very unfortunately mitigates the leakage saving. This paper proposes a new PG scheme considering data retentiveness of SRAM. After entering the sleep mode, data of an SRAM cell is not lost immediately and is usable by checking the validity of the data. Therefore, we utilize data retentiveness of SRAM to avoid energy overhead for data recovery, which results in further chance of leakage saving. To check availability, we introduce a simple hardware whose overhead is ignorable. Our experimental result shows that utilizing data retentiveness saves up to 32.42% of more leakage than conventional PG.
An analytical performance evaluation model is presented in this paper. A time-splitting transmitter circuit employing a selectively activated flip-driver (SAFD) is presented and its performance is estimated by the new model. The optimal partitioning method which maximizes the performance of a given bus-invert (BI) coding circuit is also presented. When a bus is optimally partitioned, an ordinary BI circuit can reduce the number of bus transitions by about 25%, while an SAFD circuit can remove about 35% of them. The newly developed method is verified by simulations whose results correspond very well to the values predicted by the model.