The search functionality is under construction.

IEICE TRANSACTIONS on Electronics

  • Impact Factor

    0.48

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.3

Advance publication (published online immediately after acceptance)

Volume E94-C No.4  (Publication Date:2011/04/01)

    Special Section on Circuits and Design Techniques for Advanced Large Scale Integration
  • FOREWORD Open Access

    Kunio UCHIYAMA  

     
    FOREWORD

      Page(s):
    385-385
  • Prospective Silicon Applications and Technologies in 2025 Open Access

    Koji KAI  Minoru FUJISHIMA  

     
    INVITED PAPER

      Page(s):
    386-393

    Today, practical semiconductor products are an integral part of our lives and the infrastructure of society, and this trend will continue in the future. New areas of application will expand into medical, environmental, and agriculture (food)-related fields in addition to the conventional information and communication technology (ICT)-related field. Low-cost semiconductor devices with advanced functions have thus far been realized by miniaturization. However, we are now approaching the physical limit of miniaturization, and also, the investment required for new semiconductor manufacturing facilities has become huge. Under such circumstances, we propose an approach based on semiconductor devices called microcube chips and ideas of semiconductor development, i.e., agile integration and "inch-fab." Our approach is expected to contribute to expanding the range of companies that can fabricate semiconductor devices to include small-size companies, exploring new applications of semiconductor devices, and providing a wide variety of semiconductor devices at a low cost from the semiconductor industry.

  • Low Power Platform for Embedded Processor LSIs Open Access

    Toru SHIMIZU  Kazutami ARIMOTO  Osamu NISHII  Sugako OTANI  Hiroyuki KONDO  

     
    INVITED PAPER

      Page(s):
    394-400

    Various low power technologies have been developed and applied to LSIs from the point of device and circuit design. A lot more CPU cores as well as function IPs are integrated on a single chip LSI today. Therefore, not only the device and circuit low power technologies, but software power control technologies are becoming more important to reduce active power of application systems. This paper overviews the low power technologies and defines power management platform as a combination of hardware functions and software programming interface. This paper discusses importance of the power management platform and direction of its development.

  • Multiple Region-of-Interest Based H.264 Encoder with a Detection Architecture in Macroblock Level Pipelining

    Tianruo ZHANG  Chen LIU  Minghui WANG  Satoshi GOTO  

     
    PAPER

      Page(s):
    401-410

    This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68 k gates. Detection speed is 753 fps for CIF format video at the operation frequency of 200 MHz in multiple detection mode with power consumption of 0.47 mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.

  • Optimized 2-D SAD Tree Architecture of Integer Motion Estimation for H.264/AVC

    Yibo FAN  Xiaoyang ZENG  Satoshi GOTO  

     
    PAPER

      Page(s):
    411-418

    Integer Motion Estimation (IME) costs much computation in H.264/AVC video encoder. 2-D SAD tree IME architecture provides very high performance for encoder, and it has been used by many video codec designs. This paper proposes an optimized hardware design of 2-D SAD tree IME. Firstly, a new hardware architecture is proposed to reduce on-chip memory size. Secondly, a new search pattern is proposed to fully use memory bandwidth and reduce external memory access. Thirdly, the data-path is redesigned, and the performance is greatly improved. In order to compare with other IME designs, an IME design support D1 size, 30 fps with search range [32, 32] is implemented. The hardware cost of this design includes 118 KGates and 8 Kb SRAM, the maximum clock frequency is 200 MHz. Compared to the original 2-D SAD tree IME, our design saves 87.5% on-chip memory, and achieves 3 times performance than original one. Our design provides a new way to design a low cost and high performance IME for H.264/AVC encoder.

  • A 530 Mpixels/s Intra Prediction Architecture for Ultra High Definition H.264/AVC Encoder

    Gang HE  Dajiang ZHOU  Jinjia ZHOU  Tianruo ZHANG  Satoshi GOTO  

     
    PAPER

      Page(s):
    419-427

    Intra coding in H.264/AVC significantly enhances video compression efficiency. However, due to the high data dependency of intra prediction in H.264, both pipelining and parallel processing techniques are limited to be applied. Moreover, it is difficult to get high hardware utilization and throughput because of the long block/MB-level reconstruction loops. This paper proposes a high-performance intra prediction architecture that can support H.264/AVC high profile. The proposed MB/block co-reordering can avoid data dependency and improve pipeline utilization. Therefore, the timing constraint of real-time 40962160 encoding can be achieved with negligible quality loss. 1616 prediction engine and 88 prediction engine work parallel for prediction and coefficients generating. A reordering interlaced reconstruction is also designed for fully pipelined architecture. It takes only 160 cycles to process one macroblock (MB). Hardware utilization of prediction and reconstruction modules is almost 100%. Furthermore, PE-reusable 88 intra predictor and hybrid SAD & SATD mode decision are proposed to save hardware cost. The design is implemented by 90 nm CMOS technology with 113.2 k gates and can encode 40962160 video sequences at 60 fps with operation frequency of 332 MHz.

  • Highly Parallel and Fully Reused H.264/AVC High Profile Intra Predictor Generation Engine for Super Hi-Vision 4k4k@60 fps

    Yiqing HUANG  Xiaocong JIN  Jin ZHOU  Jia SU  Takeshi IKENAGA  

     
    PAPER

      Page(s):
    428-438

    One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 88 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 44 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-44-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 1616 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 44 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 µm technology under worst work conditions (1.62 V, 125°C), with 380 MHz and 37.2 k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4 k4 k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.

  • Cache Based Motion Compensation Architecture for Quad-HD H.264/AVC Video Decoder

    Jinjia ZHOU  Dajiang ZHOU  Gang HE  Satoshi GOTO  

     
    PAPER

      Page(s):
    439-447

    In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4S×4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 3949%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 2440%. When implemented with SMIC 90 nm process, this design costs a logic gate count and on-chip memory of 108.8 k and 3.1 kB respectively. The proposed MC architecture can support real-time processing of 3840×2160@60 fps with less than 166 MHz.

  • A Low-Power Real-Time SIFT Descriptor Generation Engine for Full-HDTV Video Recognition

    Kosuke MIZUNO  Hiroki NOGUCHI  Guangji HE  Yosuke TERACHI  Tetsuya KAMINO  Tsuyoshi FUJINAGA  Shintaro IZUMI  Yasuo ARIKI  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER

      Page(s):
    448-457

    This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 4.2 mm2 containing 1.1 M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.

  • VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

    Hiroki NOGUCHI  Kazuo MIURA  Tsuyoshi FUJINAGA  Takanobu SUGAHARA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER

      Page(s):
    458-467

    We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.

  • A Novel Cache Replacement Policy via Dynamic Adaptive Insertion and Re-Reference Prediction

    Xi ZHANG  Chongmin LI  Zhenyu LIU  Haixia WANG  Dongsheng WANG  Takeshi IKENAGA  

     
    PAPER

      Page(s):
    468-476

    Previous research illustrates that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently RRIP policy is proposed to improve the performance for such kind of workloads. However, the lack of access recency information in RRIP confuses the replacement policy to make the accurate prediction. To enhance the robustness of RRIP for recency-friendly workloads, we propose an Dynamic Adaptive Insertion and Re-reference Prediction (DAI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. DAI-RRP makes adaptive adjustment on insertion position and prediction value for different access patterns, which makes the policy robust across different workloads and different phases. Simulation results show that DAI-RRP outperforms LRU and RRIP. For a single-core processor with a 1 MB 16-way set last-level cache (LLC), DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.1% and 2.7% respectively. Evaluations on quad-core CMP with a 4 MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 8.1% and 15.7% respectively. Furthermore, compared to LRU, DAI-RRP consumes the similar hardware for 16-way cache, or even less hardware for high-associativity cache. In summary, the proposed policy is practical and can be easily integrated into existing hardware approximations of LRU.

  • A Dynamic Continuous Signature Monitoring Technique for Reliable Microprocessors

    Makoto SUGIHARA  

     
    PAPER

      Page(s):
    477-486

    Reliability issues such as a soft error and NBTI (negative bias temperature instability) have become a matter of concern as integrated circuits continue to shrink. It is getting more and more important to take reliability requirements into account even for consumer products. This paper presents a dynamic continuous signature monitoring (DCSM) technique for high reliable computer systems. The DCSM technique dynamically generates reference signatures as well as runtime ones during executing a program. The DCSM technique stores the generated signatures in a signature table, which is a small storage circuit in a microprocessor, unlike the conventional static continuous signature monitoring techniques and contributes to saving program or data memory space that stores the signatures. Our experiments showed that our DCSM technique protected 1.4-100.0% of executed instructions depending on the size of signature tables.

  • All-Digital On-Chip Monitor for PMOS and NMOS Process Variability Utilizing Buffer Ring with Pulse Counter

    Tetsuya IIZUKA  Jaehyun JEONG  Toru NAKURA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER

      Page(s):
    487-494

    This paper proposes an all-digital process variability monitor which utilizes a simple buffer ring with a pulse counter. The proposed circuit monitors the process variability according to a count number of a single pulse which propagates on the buffer ring and a fixed logic level after the pulse vanishes. The proposed circuit has been fabricated in 65 nm CMOS process and the measurement results demonstrate that we can monitor the PMOS and NMOS variabilities independently using the proposed monitoring circuit. The proposed monitoring technique is suitable not only for the on-chip process variability monitoring but also for the in-field monitoring of aging effects such as negative/positive bias instability (NBTI/PBTI).

  • A Continuous-Time Waveform Monitoring Technique for On-Chip Power Noise Measurements in VLSI Circuits

    Yoji BANDO  Satoshi TAKAYA  Toru OHKAWA  Toshiharu TAKARAMOTO  Toshio YAMADA  Masaaki SOUDA  Shigetaka KUMASHIRO  Tohru MOGAMI  Makoto NAGATA  

     
    PAPER

      Page(s):
    495-503

    A continuous-time waveform monitoring technique for quality on-chip power noise measurements features matched probing performance among a variety of voltage domains of interest in a VLSI circuit, covering digital Vdd, analog Vdd, as well as at Vss, and multiple probing capability at various locations on power planes. A calibration flow eliminates the offset as well as gain errors among probing channels. The consistency of waveforms acquired by the proposed continuous-time monitoring and sampled-time precise digitization techniques is ensured. A 90-nm CMOS on-chip monitor prototype demonstrates dynamic power supply noise measurements with 200 mV at 2.5 V, 1.0 V, and 0.0 V, respectively, with less than 4 mV deviation among 240 probing channels.

  • Stochastic Non-homogeneous Arnoldi Method for Analysis of On-Chip Power Grid Networks under Process Variations

    Zhihua GUI  Fan YANG  Xuan ZENG  

     
    PAPER

      Page(s):
    504-510

    In this paper, a Stochastic Non-Homogeneous ARnoldi (SNHAR) method is proposed for the analysis of the on-chip power grid networks in the presence of process variations. In SNHAR method, the polynomial chaos based stochastic method is employed to handle the variations of power grids. Different from the existing StoEKS method which uses extended Krylov Subspace (EKS) method to compute the coefficients of the polynomial chaos, a computation-efficient and numerically stable Non-Homogeneous ARnoldi (NHAR) method is employed in SNHAR method to compute the coefficients of the polynomial chaos. Compared with EKS method, NHAR method has superior numerical stability and can achieve remarkably higher accuracy with even lower computational cost. As a result, SNHAR can capture the stochastic characteristics of the on-chip power grid networks with higher accuracy, but even lower computational cost than StoEKS.

  • On-Chip Resonant Supply Noise Canceller Utilizing Parasitic Capacitance of Sleep Blocks for Power Mode Switch

    Jinmyoung KIM  Toru NAKURA  Hidehiro TAKATA  Koichiro ISHIBASHI  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER

      Page(s):
    511-519

    This paper presents an on-chip resonant supply noise canceller utilizing parasitic capacitance of sleep blocks. The test chip was fabricated in a 0.18 µm CMOS process and measurement results show 43.3% and 12.5% supply noise reduction on the abrupt supply voltage switching and the abrupt wake-up of a sleep block, respectively. The proposed method requires 1.5% area overhead for four 100 k-gate blocks, which is 7.1 X noise reduction efficient comparing with the conventional decap for the same power supply noise, while achieves 47% improvement of settling time. These results make fast switching of power mode possible for dynamic voltage scaling and power gating.

  • Short Term Cell-Flipping Technique for Mitigating SNM Degradation Due to NBTI

    Yuji KUNITAKE  Toshinori SATO  Hiroto YASUURA  

     
    PAPER

      Page(s):
    520-529

    Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10 msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.

  • A Large “Read” and “Write” Margins, Low Leakage Power, Six-Transistor 90-nm CMOS SRAM

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER

      Page(s):
    530-538

    We developed and applied a new circuit, called the “Self-controllable Voltage Level (SVL)” circuit, to achieve an expanded “read” and “write” margins and low leakage power in a 90-nm, 2-kbit, six-transistor CMOS SRAM. At the threshold voltage fluctuation of 6σ, the minimum supply voltage of the newly developed (dvlp.) SRAM for “write” operation was significantly reduced to 0.11 V, less than half that of an equivalent conventional (conv.) SRAM. The standby leakage power of the dvlp. SRAM was only 1.17 µW, which is 4.64% of that of the conv. SRAM at supply voltage of 1.0 V. Moreover, the maximum operating clock frequency of the dvlp. SRAM was 138 MHz, which is 15% higher than that (120 MHz) of the conv. SRAM at VMM of 0.4 V. An area overhead was 0.81% that of the conv. SRAM.

  • Improvement of Read Disturb, Program Disturb and Data Retention by Memory Cell VTH Optimization of Ferroelectric (Fe)-NAND Flash Memories for Highly Reliable and Low Power Enterprise Solid-State Drives (SSDs)

    Teruyoshi HATANAKA  Mitsue TAKAHASHI  Shigeki SAKAI  Ken TAKEUCHI  

     
    PAPER

      Page(s):
    539-547

    This paper presents an improvement of the memory cell reliability by the memory cell VTH optimization of the ferroelectric (Fe)-NAND flash memory. The effects of the memory cell VTH on the reliability of the Fe-NAND flash memory are experimentally analyzed for the first time. The reliability is evaluated by the measured VTH shift due to the read disturb, program disturb and data retention. Three types of Fe-NAND flash memory cells, a positive, zero and negative VTH memory cell, are defined on the basis of the memory cell VTH. The middle of VTH of programmed and erased states is 1 V, 0 V and -0.3 V in a positive, zero and negative VTH memory cell, respectively. The VTH shift of the positive, zero and negative VTH memory cells show similar characteristics in the program/erase and the VPASS and VPGM disturbs because the external electric field is so high that the internal depolarization field does not affect the VTH shift. On the other hand, in the data retention, the VTH shift of the three types of VTH memory cells show different characteristics. The reliability of the Fe-NAND flash memory is best optimized in the zero VTH memory cell. In the proposed zero VTH Fe-NAND flash memory cell scheme, the measured VTH shift due to the read disturb, program disturb and data retention decreases by 32%, 24% and 10%, respectively, compared with conventional positive VTH Fe-NAND flash memory cell scheme. Contrarily, in the negative VTH memory cell, the VTH shift during the data retention is 0.49 V and unacceptably large because of the depolarization field. The conventional positive VTH memory cell suffers from a sever read and program disturb. The measured results are drastically different from those of the conventional floating-gate NAND flash memory cell where the negative VTH memory cell is most suitable in terms of the reliability.

  • A Genuine Power-Gatable Reconfigurable Logic Chip with FeRAM Cells

    Masahiro IIDA  Masahiro KOGA  Kazuki INOUE  Motoki AMAGASAKI  Yoshinobu ICHIDA  Mitsuro SAJI  Jun IIDA  Toshinori SUEYOSHI  

     
    PAPER

      Page(s):
    548-556

    An advantage of an RLD (reconfigurable logic device) such as an FPGA (field programmable gate array) is that it can be customized after being manufactured. Due to the aggressive technology scaling, device density is increasing, and it has become a serious problem in power consumption accordingly. In SoC of embedded systems, power gating is one of the major power reduction techniques. However, it is difficult to adopt SRAM-based RLDs because of the high overhead and SRAM being volatile. In this paper, we describe a TEG (test element group) chip of a reconfigurable logic based FeRAM (ferroelectric random access memory) technology. FeRAM brings reconfigurable logic devices the advantage of being a genuine power gater. The chip employs island-style routing architecture and uses a variable grain logic cell as a logic block. A NV-FF (non-volatile flip-flop), which contains FeRAM, a FF, and power-gating control circuits, is used as both configuration memories and FFs in a logic block. The NV-FF can transmit data between FeRAM and FF automatically when a power source is turned off/on. Thus chip-level power gating is possible. The hibernate/restore time is less than 1 ms. The chip has 1818 logic blocks and an area of 54.76 mm2.

  • A Single-Chip RF Tuner/OFDM Demodulator for Mobile Digital TV Application

    Yoshimitsu TAKAMATSU  Ryuichi FUJIMOTO  Tsuyoshi SEKINE  Takaya YASUDA  Mitsumasa NAKAMURA  Takuya HIRAKAWA  Masato ISHII  Motohiko HAYASHI  Hiroya ITO  Yoko WADA  Teruo IMAYAMA  Tatsuro OOMOTO  Yosuke OGASAWARA  Masaki NISHIKAWA  Yoshihiro YOSHIDA  Kenji YOSHIOKA  Shigehito SAIGUSA  Hiroshi YOSHIDA  Nobuyuki ITOH  

     
    PAPER

      Page(s):
    557-566

    This paper presents a single-chip RF tuner/OFDM demodulator for a mobile digital TV application called “1-segment broadcasting.” To achieve required performances for the single-chip receiver, a tunable technique for a low-noise amplifier (LNA) and spurious suppression techniques are proposed in this paper. Firstly, to receive all channels from 470 MHz to 770 MHz and to relax distortion characteristics of following circuit blocks such as an RF variable-gain amplifier and a mixer, a tunable technique for the LNA is proposed. Then, to improve the sensitivity, spurious signal suppression techniques are also proposed. The single-chip receiver using the proposed techniques is fabricated in 90 nm CMOS technology and total die size is 3.26 mm 3.26 mm. Using the tunable LNA and suppressing undesired spurious signals, the sensitivities of less than -98.6 dBm are achieved for all the channels.

  • A 500 Mb/s Differential Input Non-coherent BPSK Receiver for UWB-IR Communication

    Mohiuddin HAFIZ  Nobuo SASAKI  Takamaro KIKKAWA  

     
    PAPER

      Page(s):
    567-574

    A differential input non-coherent BPSK receiver for the UWB-IR communication, based on threshold detection, has been presented in this paper. The chip can recover BPSK modulated Gaussian monocycle pulses (GMP), along with its first derivative, at a data rate of 500 Mb/s. No clock reception is required, as the receiver recovers data based on the relative phase of the two simultaneously received inputs. While retrieving the data, it consumes a power of 63 mW from a supply voltage of 1.8 V. A shunt-peaked narrow band amplifier, matched to the input antenna, is used to amplify the received GMP. Wireless data have been successfully recovered using a pair of horn antennas at a distance of 6 cm. The chip, developed in a 180 nm CMOS technology, occupies a die area of 3.4 mm2. The receiver is suitable for the non-coherent (self-synchronized) UWB-IR communication.

  • A 66-dBc Fundamental Suppression Frequency Doubler IC for UWB Sensor Applications

    Jiangtao SUN  Qing LIU  Yong-Ju SUH  Takayuki SHIBATA  Toshihiko YOSHIMASU  

     
    PAPER

      Page(s):
    575-581

    A balanced push-push frequency doubler has been demonstrated in 0.25-µm SOI (Silicon on Insulator) SiGe BiCMOS technology operating from 22 GHz to 29 GHz with high fundamental frequency suppression and high conversion gain. A series LC resonator circuit is connected in parallel with the differential outputs of the doubler core circuit. The LC resonator is effective to improve the fundamental frequency suppression. In addition, the LC resonator works as a matching circuit between the output of the doubler core and the input of the output buffer amplifier, which increases the conversion gain of the whole circuit. A measured fundamental frequency suppression of greater than 46 dBc is achieved at an input power of -10 dBm in the output frequency band of 22-29 GHz. Moreover, maximum fundamental frequency suppression of 66 dBc is achieved at an input frequency of 13 GHz and an input power of -10 dBm. The frequency doubler works at a supply voltage of 3.3 V.

  • An Injection-Controlled 10-Gb/s Burst-Mode CDR Circuit for a 1G/10G PON System

    Hiroaki KATSURAI  Hideki KAMITSUNA  Hiroshi KOIZUMI  Jun TERADA  Yusuke OHTOMO  Tsugumichi SHIBATA  

     
    PAPER

      Page(s):
    582-588

    As a future passive optical network (PON) system, the 10 Gigabit Ethernet PON (10G-EPON) has been standardized in IEEE 802.3av. As conventional Gigabit Ethernet PON (GE-PON) systems have already been widely deployed, 1G/10G co-existence technologies are strongly required for the next system. A gated voltage-controlled-oscillator (G-VCO)-based 10-Gb/s burst-mode clock and data recovery (CDR) circuit is presented for a 1G/10G co-existence PON system. It employs two new circuits to improve jitter transfer and provide tolerance to 1G/10G operation. An injection-controlled jitter-reduction circuit reduces output-clock jitter by 7 dB from 200-MHz input data jitter while keeping a short lock time of 20 ns. A frequency-variation compensation circuit reduces frequency mismatch among the three VCOs on the chip and offers large tolerance to consecutive identical digits. With the compensation, the proposed CDR circuit can employ multi VCOs, which provide tolerance to the 1G/10G co-existence situation. It achieves error-free (bit-error rate < 10-12) operation for 10-G bursts following bursts of other rates, obviously including 1G bursts. It also provides tolerance to a 256-bit sequence without a transition in the data, which is more than enough tolerance for 65-bit CIDs in the 64B/66B code of 10 Gigabit Ethernet.

  • Device Modeling Techniques for High-Frequency Circuits Design Using Bond-Based Design at over 100 GHz

    Ryuichi FUJIMOTO  Kyoya TAKANO  Mizuki MOTOYOSHI  Uroschanit YODPRASIT  Minoru FUJISHIMA  

     
    PAPER

      Page(s):
    589-597

    Device modeling techniques for high-frequency circuits operating at over 100 GHz are presented. We have proposed the bond-based design as an accurate high-frequency circuit design method. Because layout parasitic extractions (LPE) are not required in the bond-based design, it can be applied high-frequency circuit design at over 100 GHz. However, customized device models are indispensable for the bond-based design. In this paper, device modeling techniques for high-frequency circuit design using the bond-based design are proposed. The customized device model for MOSFETs, transmission lines and pads are introduced. By using customized device models, the difference between the simulated and measured gains of an amplifier is improved to less than 0.6 dB at 120 GHz.

  • 0.18-V Input Charge Pump with Forward Body Bias to Startup Boost Converter for Energy Harvesting Applications

    Po-Hung CHEN  Koichi ISHIDA  Xin ZHANG  Yasuyuki OKUMA  Yoshikatsu RYU  Makoto TAKAMIYA  Takayasu SAKURAI  

     
    PAPER

      Page(s):
    598-604

    In this paper, a 0.18-V input three-stage charge pump circuit applying forward body bias is proposed for energy harvesting applications. In the developed charge pump, all the MOSFETs are forward body biased by using the inter-stage/output voltages. By applying the proposed charge pump as the startup in the boost converter, the kick-up input voltage of the boost converter is reduced to 0.18 V. To verify the circuit characteristics, the conventional zero body bias charge pump and the proposed forward body bias charge pump were fabricated with 65 nm CMOS process. The measured output current of the proposed charge pump under 0.18-V input voltage is increased by 170% comparing to the conventional one at the output voltage of 0.5 V. In addition, the boost converter successfully boosts the 0.18-V input to higher than 0.65-V output.

  • An Energy Efficiency 4-bit Multiplier with Two-Phase Non-overlap Clock Driven Charge Recovery Logic

    Yimeng ZHANG  Leona OKAMURA  Tsutomu YOSHIHARA  

     
    PAPER

      Page(s):
    605-612

    A novel charge-recovery logic structure called Pulse Boost Logic (PBL) is proposed in this paper. PBL is a high-speed low-energy-dissipation charge-recovery logic with dual-rail evaluation tree structure. It is driven by 2-phase non-overlap clock, and requires no DC power supply. PBL belongs to boost logic family, which includes boost logic, enhanced boost logic and subthreshold boost logic. In this paper, PBL has been compared with other charge-recovery logic technologies. To demonstrate the performance of PBL structure, a 4-bit pipeline multiplier is designed and fabricated with 0.18 µm CMOS process technology. The simulation results indicate that the 4-bit multiplier can work at a frequency of 1.8 GHz, while the measurement of test chip is at operation frequency of 161 MHz, and the power dissipation at 161 MHz is 772 µW.

  • Dicode Partial Response Signaling over Inductively-Coupled Channel

    Koichi YAMAGUCHI  Masayuki MIZUNO  

     
    PAPER

      Page(s):
    613-618

    Dicode partial response signaling system over inductively-coupled channel has been developed to achieve higher data rate than self-resonant frequencies of inductors. The developed system operates at five times higher data rates than conventional systems with the same inductor. A current-mode equalization in the transmitter designed in a 90-nm CMOS successfully reshapes waveforms to obtain dicode signals at the receiver. For a 5-Gb/s signaling through the coupled inductors with a 120-µm diameter and a 120-µm distance, 20-mV eye opening was observed. The power consumption value of the transmitter was 58 mW at the 5-Gb/s operation.

  • A Duobinary Signaling for Asymmetric Multi-Chip Communication

    Koichi YAMAGUCHI  Masayuki MIZUNO  

     
    PAPER

      Page(s):
    619-626

    Duobinary signaling has been introduced into asymmetric multi-chip communications such as DRAM or display interfaces, which allows a controlled amount of ISI to reduce signaling bandwidth by 2/3. A × 2 oversampled equalization has been developed to realize Duobinary signaling. Symbol-rate clock recovery form Duobinary signal has been developed to reduce power consumption for receivers. A Duobinary transmitter test chip was fabricated with 90-nm CMOS process. A 3.5 dB increase in eye height and a 1.5 times increase in eye width was observed.

  • A 0.18-µm CMOS X-Band Shock Wave Generator with an On-Chip Dipole Antenna and a Digitally Programmable Delay Circuit for Pulse Beam-Formability

    Nguyen Ngoc MAI KHANH  Masahiro SASAKI  Kunihiro ASADA  

     
    PAPER

      Page(s):
    627-634

    In this paper, we present a 0.18-µm CMOS fully integrated X-band shock wave generator (SWG) with an on-chip dipole antenna and a digitally programmable delay circuit (DPDC) for pulse beam-formability in short-range and hand-held microwave active imaging applications. This chip includes a SWG, a 5-bit DPDC and an on-chip wide-band meandering dipole antenna. By using an integrated transformer, output pulse of the SWG is sent to the on-chip meandering dipole antenna. The SWG operates based on damping conditions to produce a 0.4-V peak-to-peak (p-p) pulse amplitude at the antenna input terminals in HSPICE simulation. The DPDC is designed to adjust delays of shock-wave outputs for the purpose of steering beams in antenna array systems. The wide-band dipole antenna element designed in the meandering shape is located in the top metal of a 5-metal-layer 0.18-µm CMOS chip. By simulating in Momentum of ADS 2009, the minimum value of antenna's return loss, S 11, and antenna's bandwidth (BW) are -19.37 dB and 25.3 GHz, respectively. The measured return loss of a stand-alone integrated meandering dipole is from -26 dB to -10 dB with frequency range of 7.5-12 GHz. In measurements of the SWG with the integrated antenna, by using a 20-dB standard gain horn antenna placed at a 38-mm distance from the chip's surface, a 1.1-mVp-p shock wave with a 9-11-GHz frequency response is received. A measured 3-ps pulse delay resolution is also obtained. These results prove that our proposed circuit is suitable for the purpose of fully integrated pulse beam-forming system.

  • A 500 MS/s 600 µW 300 µm2 Single-Stage Gain-Improved and Kickback Noise Rejected Comparator in 0.35 µm 3.3 v CMOS Process

    Sarang KAZEMINIA  Morteza MOUSAZADEH  Kayrollah HADIDI  Abdollah KHOEI  

     
    BRIEF PAPER

      Page(s):
    635-640

    This paper presents a high speed single-stage latched comparator which is scheduled in time for both amplification and latch operations. Small active area and simple switching strategy besides desired power consumption at high comparison rates qualifies the proposed comparator to be repeatedly employed in high speed flash A/D converters. A strategy of kickback noise elimination besides gain enhancement is also introduced. A low power holding read-out circuit is presented. Post-Layout simulation results confirm 500 MS/s comparison rate with 5 mv resolution for a 1.6 v peak-to-peak input signal range and 600 µw power consumption from a 3.3 v power supply by using TSMC model of 0.35 µm CMOS technology. Total active area of proposed comparator and read-out circuit is about 300 µm2.

  • Regular Section
  • Broadening Adjustable Range on Post-Fabrication Resonance Wavelength Trimming of Long-Period Fiber Gratings and the Mechanisms of Resonance Wavelength Shifts

    Fatemeh ABRISHAMIAN  Katsumi MORISHITA  

     
    PAPER-Optoelectronics

      Page(s):
    641-647

    The adjustable range on post-fabrication resonance wavelength trimming of long-period fiber gratings was broadened toward the blue side, and the mechanisms of the resonance wavelength shifts caused by heating were investigated. It can be concluded that the glass structure relaxes more slowly than the residual stress with decreasing heating temperature and the blue shift caused by the residual stress relaxation appears more strongly at the early stage of heating. The blue shift of 41 nm was obtained by heating a long-period grating at 600 for 3500 minutes. The changes of the index difference inducing the wavelength shifts of -41 nm and 35 nm were estimated at about -1.210-4 and +1.0 10-4 by numerical analysis, respectively.

  • A 7-GHz, Low-Power, Low Phase-Noise Differential Current-Reused VCO Utilizing a Trifilar-Transformer-Feedback Technique

    Yan-Ru TSENG  Tzuen-Hsi HUANG  Shang-Hsun WU  

     
    PAPER-Microwaves, Millimeter-Waves

      Page(s):
    648-653

    This paper presents a 7 GHz differential current-reused voltage-controlled oscillator (CR-VCO) with low power consumption and low phase noise using 0.18-µm CMOS technology. The output power of this CR-VCO is enhanced by utilizing a trifilar-transformer-feedback technique. The lower phase noise is achieved by the more symmetric voltage swings resulting from the improved balance of switching current. At a 1.5-V DC supply voltage, the power dissipation is only 3.4 mW. The total tuning range is 1.4 GHz (17.9%) as the tuning voltage ranges from 0 V to 1.8 V. The optimum phase noise is around -117.3 dBc/Hz at a frequency offset of 1 MHz from the center frequency of 7.07 GHz. The corresponding output power is around -6.8 dBm. For the proposed CR-VCO, the calculated figures-of-merit, FOM and FOMT , are -188.9 and -193.9 dBc/Hz, respectively.

  • Cascaded Time Difference Amplifier with Differential Logic Delay Cell

    Shingo MANDAI  Toru NAKURA  Tetsuya IIZUKA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Electronic Circuits

      Page(s):
    654-662

    We introduce a 16 × cascaded time difference amplifier (TDA) using a differential logic delay cell with 0.18 µm CMOS process. By employing the differential logic delay cell in the delay chain instead of the CMOS logic delay cell, less than 8% TD gain offset with 150 ps input range is achieved. The input referred standard deviation of the output time difference error is 2.7 ps and the input referred is improved by 17% compared with that of the previous TDA using the CMOS logic delay cell.

  • A 45-nm 37.3 GOPS/W Heterogeneous Multi-Core SOC with 16/32 Bit Instruction-Set General-Purpose Core

    Osamu NISHII  Yoichi YUYAMA  Masayuki ITO  Yoshikazu KIYOSHIGE  Yusuke NITTA  Makoto ISHIKAWA  Tetsuya YAMADA  Junichi MIYAKOSHI  Yasutaka WADA  Keiji KIMURA  Hironori KASAHARA  Hideo MAEJIMA  

     
    PAPER-Integrated Electronics

      Page(s):
    663-669

    We built a 12.4 mm12.4 mm, 45-nm CMOS, chip that integrates eight 648-MHz general purpose cores, two matrix processor (MX-2) cores, four flexible engine (FE) cores and media IP (VPU5) to establish heterogeneous multi-core chip architecture. The general purpose core had its IPC (instructions per cycle) performance enhanced by adding 32-bit instructions to the existing 16-bit fixed-length instruction set and executing up to two 32-bit instructions per cycle. Considering these five-to-seven years of embedded LSI and increasing trend of access-master within LSI, we predict that the memory usage of single core will not exceed 32-bit physical area (i.e. 4 GB), but chip-total memory usage will exceed 4 GB. Based on this prediction, the physical address was expanded from 32-bit to 40-bit. The fabricated chip was tested and a parallel operation of eight general purpose cores and four FE cores and eight data transfer units (DTU) is obtained on AAC (Advanced Audio Coding) encode processing.

  • A Resistor-Compensation Technique for CMOS Bandgap and Current Reference with Simplified Start-Up Circuit

    Guo-Ming SUNG  Ying-Tsu LAI  Chien-Lin LU  

     
    BRIEF PAPER-Electronic Circuits

      Page(s):
    670-673

    This paper presents a resistor-compensation technique for a CMOS bandgap and current reference, which utilizes various high positive temperature coefficient (TC) resistors, a two-stage operational transconductance amplifier (OTA) and a simplified start-up circuit in the 0.35-µm CMOS process. In the proposed bandgap and current reference, numerous compensated resistors, which have a high positive temperature coefficient (TC), are added to the parasitic n-p-n and p-n-p bipolar junction transistor devices, to generate a temperature-independent voltage reference and current reference. The measurements verify a current reference of 735.6 nA, the voltage reference of 888.1 mV, and the power consumption of 91.28 µW at a supply voltage of 3.3 V. The voltage TC is 49 ppm/ in the temperature range from 0 to 100 and 12.8 ppm/ from 30 to 100. The current TC is 119.2 ppm/ at temperatures of 0 to 100. Measurement results also demonstrate a stable voltage reference at high temperature (> 30), and a constant current reference at low temperature (< 70).