This letter introduces an innovation for the heterogeneous storage architecture of AI chips, specifically focusing on the integration of six transistors(6T) and eight transistors(8T) hybrid SRAM. Traditional approaches to reducing SRAM power consumption typically involve lowering the operating voltage, a method that often substantially diminishes the recognition rate of neural networks. However, the innovative design detailed in this letter amalgamates the strengths of both SRAM types. It operates at a voltage lower than conventional SRAM, thereby significantly reducing the power consumption in neural networks without compromising performance.
Akira KITAYAMA Goichi ONO Tadashi KISHIMOTO Hiroaki ITO Naohiro KOHMU
Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.
Analog and digital collaborative design techniques for wireless SoCs are reviewed in this paper. In wireless SoCs, delicate analog performance such as sensitivity of the receiver is easily degraded due to interferences from digital circuit blocks. On the other hand, an analog performance such as distortion is strongly compensated by digital assist techniques with low power consumption. In this paper, a sensitivity recovery technique using the analog and digital collaborative design, and digital assist techniques to achieve low-power and high-performance analog circuits are presented. Such analog and digital collaborative design is indispensable for wireless SoCs.
Katsuhiko UEDA Zuiko RIKUHASHI Kentaro HAYASHI Hiroomi HIKAWA
It is important to reduce the power consumption of complementary metal oxide semiconductor (CMOS) logic circuits, especially those used in mobile devices. A CMOS logic circuit consists of metal-oxide-semiconductor field-effect transistors (MOSFETs), which consume electrical power dynamically when they charge and discharge load capacitance that is connected to their output. Load capacitance mainly exists in wiring or buses, and transitions between logic 0 and logic 1 cause these charges and discharges. Many methods have been proposed to reduce these transitions. One novel method (called segmentation coding) has recently been proposed that reduces power consumption of CMOS buses carrying band-limited signals, such as audio data. It improves performance by employing dedicated encoders for the upper and lower bits of transmitted data, in which the transition characteristics of band-limited signals are utilized. However, it uses a conventional majority voting circuit in the encoder for lower bits, and the circuit uses many adders to count the number of 1s to calculate the Hamming distance between the transmitted data. This paper proposes segmentation coding with pseudo-majority voting. The proposed pseudo-majority voting circuit counts the number of 1s with fewer circuit resources than the conventional circuit by further utilizing the transition characteristics of band-limited signals. The effectiveness of the proposed method was demonstrated through computer simulations and experiments.
Yukihiro TSUCHIDA Koichi MAEDA Ryuichi SUGIZAKI
We propose multi-core erbium-doped fiber amplifiers for next-generation optical amplifiers utilized by space-division multiplexing technologies. Multi-core erbium-doped fiber amplifiers were studied widely as a means for overcoming exponential growth of internet traffic in the backbone network. We consider two approaches to excitation of erbium irons; One is core-pumping scheme, the other is cladding-pumping scheme. For a core-pumping configuration, we evaluate its applicability to future ultra long-haul network. In addition, we demonstrate that cladding-pumping configuration will enable reduction of power consumption, size, and cost because one multimode pumping laser diode can excite several cores simultaneously embedded in a common cladding and amplify several signals passed through the multi-core erbium-doped fiber cores.
Yoshikazu MIYANAGA Wataru TAKAHASHI Shingo YOSHIZAWA
This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85-99% under 0-20dB SNR and echo environments.
Akira SAKAIGAWA Masaaki KABE Tsutomu HARADA Fumitaka GOTO Naoyuki TAKASAKI Masashi MITSUI Tae NAKAHARA Kojiro IKEDA Kenta SEKI Toshiyuki NAGATSUMA Amane HIGASHI
Battery life and outdoor visibility are two of the most important features for mobile applications today. It is desirable to achieve both low power consumption and excellent outdoor visibility on the display device at the same time. We have previously reported a new RGBW method to realize low power consumption and high luminance with high image quality. In this paper, the basic concept of a new RGBW calculation utilizing an “Extended HSV color space” model is described, and also its performance, such as low power consumption, color image reproducibility and outdoor visibility is presented. The new method focuses on the luminance-increase ratio by means of a White signal for the display image data, and derives the appropriate RGBW signal and backlight PWM signal for every frame period. This dynamically controlled system solves the problems of conventional RGBW systems, and realizes the same image quality as a corresponding RGB display. In order to quantify its color image reproducibility, a spectroscopic measurement has been completed using the Macbeth Color Chart. In addition, the advantages of high luminance by the new RGBW method is described. The converted tone curve with an RGBW method provides very high luminance, such as 1,000cd/m2, and improved outdoor visibility. Finally, a newly developed 4.38-inch full-HD (1,080 × 1,920) 503ppi prototype LCD utilizing this new RGBW technology is described.
Sung-Sun CHOI Han-Yeol YU Yong-Hoon KIM
This paper presents a current-reused quadrature voltage-controlled oscillator (QVCO) which adopts a source-connection coupling structure. The QVCO simultaneously achieves low phase noise and low power consumption by newly combining current-reused VCOs and coupling transistors. The measured QVCO obtains good FoM of -188.2 dBc at a frequency of 2.2 GHz with 3.96 mW power consumption.
Mohammad SOLEIMANI Abdollah KHOEI Khayrollah HADIDI Vahid Fagih DINAVARI
In this paper, new structure of Voltage-Mode MAX-MIN circuit are presented for nonlinear systems, fuzzy applications, neural network and etc. A differential pair with improved cascode current mirror is used to choose the desired input. The advantages of the proposed structure are high operating frequency, high precision, low power consumption, low area and simple expansion for multiple inputs by adding only three transistors for each extra input. The proposed circuit which is simulated by HSPICE in 0.35 µm CMOS process shows the total power consumption of 85 µW in 5 MHz operating frequency from a single 3.3-V supply. Also, the total area of the proposed circuit is about 420 µm2 for two input voltages, and would be negligibly increased for each extra input.
Kyoya TAKANO Mizuki MOTOYOSHI Minoru FUJISHIMA
To realize low-power wireless transceivers, it is necessary to improve the performance of frequency synthesizers, which are typically frequency multipliers composed of a phase-locked loop (PLL). However, PLLs generally consume a large amount of power and occupy a large area. To improve the frequency multiplier, we propose a pulse-injection-locked frequency multiplier (PILFM), where a spurious signal is suppressed using a pulse input signal. An injection-locked oscillator (ILO) in a PILFM was fabricated by a 0.18 µm 1P5M CMOS process. The core size is 10.8 µm10.5 µm. The power consumption of the ILO is 9.6 µW at 250 MHz, 255 µW at 2.4 GHz and 1.47 mW at 4.8 GHz. The phase noise is -105 dBc/Hz at a 1 MHz offset.
Takahide SATO Isamu MATSUMOTO Shigetaka TAKAGI Nobuo FUJII
This paper proposes a low power and high speed track and hold circuit (T/H circuit) based on the two-stage structure. The proposed circuit consists of two internal T/H circuits connected in cascade. The first T/H circuit converts an input signal into a step voltage and it is applied to the following second T/H circuit which drives large load capacitors and consumes large power. Applying the step voltage to the second T/H circuit prevents the second T/H circuit from charging and discharging its load capacitor during an identical track phase and enables low power operation. Thanks to the two-stage structure the proposed T/H circuit can save 29% of the power consumption compared with the conventional one. An optimum design procedure of the proposed two stage T/H circuit is explained and its validity is confirmed by HSPICE simulations.
Qin LIU Seiichiro HIRATSUKA Kazunori SHIMIZU Shinsuke USHIKI Satoshi GOTO Takeshi IKENAGA
Video surveillance systems have a huge market, as indicated by the number of installed cameras, particularly for low-power systems. In this paper, we propose a low-power quadtree video encoder for video surveillance systems. It features a low-complexity motion estimation algorithm, an application-specific ME-MC processor, a dedicated quadtree encoder engine and a processor control-based clock-gating technique. A chip capable of encoding 30 fps VGA (640480) at 80 MHz is fabricated using 0.18 µm CMOS technology. A total of 153 K gates with 558 kbits SRAM have been integrated into a 5.0 mm3.5 mm die. The power consumption is 40.87 mW at 80 MHz for VGA at 30 fps and 1.97 mW at 3.3 MHz for QCIF at 15 fps.
Takahiro SHIMADA Hiromi NOTANI Yasunobu NAKASE Hiroshi MAKINO Shuhei IWADE
We proposed a push-pull output buffer that maintains the data transmission rate for lower supply voltages. It operates at an internal supply voltage (VDD) of 0.7-1.6 V and an interface supply voltage (VDDX) of 1.0-3.6 V. In low VDDX operation, the output buffer utilizes parasitic bipolar transistors instead of MOS transistors to maintain drivability. Furthermore forward body bias (FBB) control is provided for the level converter in low VDD operation. We fabricated a test chip with a standard 0.15 µm CMOS process. Measurement results indicate that the proposed output buffer achieves 200 Mbps operation at VDD of 0.7 V and VDDX of 1.0 V.
Nobuhide YOSHIDA Masahiro FUJII Takao ATSUMO Keiichi NUMATA Shuji ASAI Michihisa KOHNO Hirokazu OIKAWA Hiroaki TSUTSUI Tadashi MAEDA
An emitter coupled logic (ECL) compatible low-power GaAs 8:1 multiplexer (MUX) and 1:8 demultiplexer (DEMUX) for 10-Gb/s optical communication systems has been developed. In order to decrease the power consumption and to maximize the timing margin, we estimated the power consumption for direct-coupled FET logic (DCFL) and source-coupled FET logic (SCFL) circuits in terms of the D-type flip-flop (D-FF) operating speed and the duty-ratio variation. Based on the result, we used SCFL circuits in the clock-generating circuit and the circuits operating at 10 Gb/s, and we used DCFL circuits in the circuits operating below 5 Gb/s. These ICs, which are mounted on ceramic packages, operate at up to 10 Gb/s with power consumption of 1.2 W for the 8:1 MUX and 1.0 W for the 1:8 DEMUX. This is the lowest power consumption yet reported for 10-Gb/s 8:1 MUX and 1:8 DEMUX.
Shunichi ISHIWATA Takayasu SAKURAI
Media processors have emerged so that a single LSI can realize multiple multimedia functions, such as graphics, video, audio and telecommunication with effectively shared hardware and flexible software. First, the difference between media processors and general-purpose microprocessors with multimedia extensions is clarified. Features for processes and data in the multimedia applications are summarized and are followed by the multimedia enhancements that the recent general-purpose microprocessors use. The architecture for media processors reflects the further optimized utilization of these features and realizes better price-performance ratio than the general-purpose microprocessors. Finally, the future directions of media processors are estimated, based on the performance, the power dissipation and the die size of the present microprocessors with multimedia extensions and the present media processors. The demand to improve the price-performance ratio for the whole system and to reduce the power consumption makes the media processor evolve into a system processor, which integrates not only the media processor but also the function of a general-purpose microprocessor, various interfaces and DRAMs.
Koji ASARI Hiroshige HIRANO Toshiyuki HONDA Tatsumi SUMI Masato TAKEO Nobuyuki MORIWAKI George NAKANE Tetsuji NAKAKUMA Shigeo CHAYA Toshio MUKUNOKI Yuji JUDAI Masamichi AZUMA Yasuhiro SHIMADA Tatsuo OTSUKI
Ferroelectric non-volatile memory (FeRAM) has been inspiring interests since bismuth layer perovskite material family was found to provide "Fatigue Free" endurance, superior retention and imprint characteristics. In this paper, we will provide new circuits technology for FeRAM developed to implement high speed operation, low voltage operation and low power consumption. Performance of LSI embedded with FeRAM for contactless IC card is also provided to demonstrate the feasibility of the circuit technology.
Koji TERADA Seimi SASAKI Kazuhiro TANAKA Tsuyoshi YAMAMOTO Tadashi IKEUCHI Kazunori MIURA Mitsuhiro YANO
This letter describes our DFB-LD module for use in WDM optical access networks. We realized an isolator-free DFB-LD module with a thermo-electric cooler in aim of stabilizing the emission wavelength for WDM systems. Silicon waferboard technology was employed to achieve simple assembly and small size of the module. This small size contributed to low TEC power. Our fabricated module demonstrated low-noise and stable emission wavelength characteristics under 156 Mbit/s pseudo random modulation.
Yasuaki SUMI Kouichi SYOUBU Kazutoshi TSUDA Shigeki OBOTE Yutaka FUKUI
In this paper, in order to achieve the low power consumption of programmable divider in a PLL frequency synthesizer, we propose a new prescaler method for low power consumption. A fixed prescaler is inserted in front of the (N +1/2) programmable divider which is designed based on the new principle. The divider ratio in the loop does not vary at all even if such a prescaler is utilized. Then the permissible delay periods of a programmable divider can be extended to two times as long as the conventional method, and the low power consumption and low cost in a PLL frequency synthesizer have been achieved.
Taketora SHIRAISI Koji KAWAMOTO Kazuyuki ISHIKAWA Eiichi TERAOKA Hidehiro TAKATA Takeshi TOKUDA Kouichi NISHIDA
A low power consumption 16-bit fixed point Digital Signal Processor (DSP) has been developed to realize a half-rate CODEC for the Personal Digital Cellular (PDC) system. Dual datapath architecture has been employed to execute multiply-accumulate (MAC) operations with a high degree of efficiency. With this architecture. 86.3% of total MAC operations in the Pitch Synchronous Innovation Code Excited Linear Prediction (PSI-CELP) program are executed in parallel, so that total instruction cycles are reduced by 23.1%. The area overhead for the dual datapath architecture is only 3.0% of the total area. Furthermore, in order to reduce power consumption, circuit design techniques are also extensively applied to RAMs. ROMs, and clock circuits, which consume the great majority of power. By reducing the number of precharging bit lines, a power reduction of 49.8% is achieved in RAMs, and above 40% in ROMs. By applying gated clock to clock lines, a power reduction of 5.0% is achieved in the DSP that performs the PSI-CELP algorithm. The DSP is fabricated in 0.5 µm single-poly, double-metal CMOS technology. The PSI-CELP algorithm for the PDC half-rate CODEC can operate at 22.5 MHz instruction frequency and 1.6 V supply voltage. resulting in a low-power consumption of 28 mW.
Katsuhiko UEDA Toshio SUGIMURA Toshihiro ISHIKAWA Minoru OKAMOTO Mikio SAKAKIHARA Shinichi MARUI
This paper describes a new, low power 16-bit Digital Signal Processor (DSP). The DSP has a double-speed MAC mechanism, an accelerator for Viterbi decoding, and a block floating section which contribute to lower power consumption. The double-speed MAC can perform two multiply and accumulate operations in one instruction cycle. Since MAC operations are so common in digital signal processing, this mechanism can reduce the average clock frequency of the DSP resulting in lower power consumption. The Viterbi accelerator and block floating circuitry also reduce the clock frequency by minimizing the number of required cycles needed to be executed. The DSP was fabricated using a 0.8 µm CMOS 2-aluminum layer process technology to integrate 644 K transistors on a 9.30 mm9.09 mm die. It can realize an 11.2 kbps VSELP speech CODEC while consuming only 70 mW at 3.5 V Vdd.