The search functionality is under construction.

IEICE TRANSACTIONS on Electronics

  • Impact Factor

    0.48

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.3

Advance publication (published online immediately after acceptance)

Volume E74-C No.11  (Publication Date:1991/11/25)

    Special Issue on the High Performance ASIC and Microprocessor
  • FOREWORD

    Osamu TOMISAWA  

     
    FOREWORD

      Page(s):
    3747-3748
  • A Master Chip Design of 0.5 µm Mixed BiCMOS/CMOS Channelless Gate Array Family

    Yoji NISHIO  Noriaki OKA  Shigeru TAKAHASHI  Manabu SHIBATA  

     
    PAPER-Circuit Design

      Page(s):
    3749-3756

    A mixed BiCMOS/CMOS channelless gate array family with 3-metal-layer wiring using a 5 V version, 0.5 µm BiCMOS technology is discussed. The speed and power performance of CMOS gates are superior to those of BiCMOS gates for light load capacitance. The power-delay product of CMOS gates at light load is 50% less than that of BiCMOS gates. Therefore, by using CMOS and BiCMOS gates selectively according to the weight of the capacitance load, the performance of the BiCMOS gate array is enhanced. Then, a new mixed BiCMOS/CMOS basic cell structure which can be used as BiCMOS or CMOS gates, which go to the wiring channels, was developed. The area efficiency of the developed basic cell is 16% better than that of the conventional basic cell, as got from design automation experience, etc. The wiring method of the power supply reinforcement lines of the third metal layer in a large chip was examined from the viewpoint of the number of useful basic cells. As a result, by locating the reinforcement lines at every basic cell, the number of useful basic cells is about 14% more than that of another method in which the reinforcement lines are located at certain intervals of basic cells. Propagation delay time of the 2-input NAND is 190 ps at fan out 10 load. Under a light load, a pure CMOS NAND is faster, achieving a 140 ps gate delay at fan out 2 load. This gate array family can be applied to high speed processors.

  • Self-Timed Clocking Design for a Data-Driven Microprocessor

    Fumiyasu ASAI  Shinji KOMORI  Toshiyuki TAMURA  Hisakazu SATO  Hidehiro TAKATA  Yoshihiro SEGUCHI  Takeshi TOKUDA  Hiroaki TERADA  

     
    PAPER-Circuit Design

      Page(s):
    3757-3765

    This paper details a unique VLSI design scheme which employs self-timed circuits. A 32-bit 50-MFLOPS data-driven microprocessor has been designed using a self-timed clocking scheme. This high performance data-driven microprocessor with sophisticated functions has been designed by a combination of several kinds of self-timed components. All functional blocks in the microprocessor are driven by self-timed clocks. The microprocessor integrates 700,000 devices in a 14.65 mm14.65 mm die area using double polysilicon double metal 0.8 µm CMOS technology.

  • A High Performance 32-Bit Microcontroller for Realtime Applications

    Masafumi TAKAHASHI  Yasuo YAMADA  Emi KANEKO  Shinichi YOSHIOKA  Haruyuki TAGO  

     
    PAPER-Core and Macrocells

      Page(s):
    3766-3774

    A 10-MIPS peak performance single-chip MCU (Micro Controller Unit) core has been developed for real time applications. The following features are implemented to improve cost-effectiveness and the worst interrupt response: (1) large-scale on-chip register with overlapping register windows, (2) the mechanism of receiving exception request with concurrent execution of another instruction, (3) load-store architecture, (4) optimized instruction pipelining, and (5) two 32-bit internal and a 16-bit external buses. It delivers about 5-MIPS average performance in some benchmark programs. The worst overall interrupt response time, which is defined to be the one from receiving an interrupt request to saving the general registers in the invoked interrupt routine, is measured to be 2.38 µs, which is about three fold improvements over a commercial 32-bit MCU. The prototype contains only 15,000 gates, and is cost-effective for 16/32-bit single-chip MCU.

  • A 64 b CMOS Mainframe Execution Unit Macrocell with Error Detecting Circuit

    Takehisa HAYASHI  Toshio DOI  Mikio YAMAGISHI  Kazuo KOIDE  Akira ISHIYAMA  Masataka HIRAMATSU  Akira YAMAGIWA  

     
    PAPER-Core and Macrocells

      Page(s):
    3775-3779

    A 64 b CMOS mainframe execution unit macrocell with error detecting circuits is proposed. The conventional techniques to maintain high reliability have been the parity checking and the duplication of the ALU (Arithmetic Logic Unit). However, the required time for generating the parity from the sum output of the ALU has been undesirable for high-speed operation. In order to achieve a short ALU delay time, a parity predicting logic structure is newly adopted. By utilizing this structure, a one-bit-error detecting function is integrated without duplicating the every ALU circuit. A novel CMOS precharged circuit is also developed to shorten the time required to precharge the whole circuit. When the number of circuit stages is reduced, the precharge time as well as the delay time restricts the ALU cycle time. This new circuitry solves the precharging time accumulation problem in the conventional circuits. A 64 b BCD ALU adopting this technology has been designed and fabricated. The parity predict architecture and the high-speed-precharge circuit have been effective in reducing the delay time by 23% and the precharge time by 42%. A 30% faster cycle time has been achieved with a small increase (4%) in ALU area. The execution unit macrocell, which includes the ALU described above, contains 45 k transistors and it's area is 4.3 mm4.1 mm using the 0.8 µm CMOS triple metal layer technology.

  • A 5 ns Embedded RAM for CMOS ASICs and Its Applications to a One-Chip 4096-Channel Time Switch VLSI for Digital Switching Systems

    Masao MIZUKAMI  Yasuo MIKAMI  Osamu MATSUBARA  Yoichi SATOH  Koichi SUDOH  

     
    PAPER-Core and Macrocells

      Page(s):
    3780-3786

    This paper describes circuit techniques of a 5 ns, 4 kw9 b embedded RAM for standard cell ASICs applying 0.8 µm pure CMOS triple metal technology. The design goals of the above techniques were high speed, low power consumption, and access time stability even when the RAM configuration is changed in word and bit numbers. A one-chip 4,096-channel time switch VLSI for digital switching systems is also described as an example of application of these RAMs to standard cell CMOS ASICs. This chip has 600 mW power consumption during 32 MHz operation.

  • A High-Performance Reconfigurable Line Memory Macrocell for Video Signal Processing ASICs

    Tetsuya MATSUMURA  Masahiko YOSHIMOTO  Atsushi MAEDA  Yasutaka HORIBA  

     
    PAPER-Core and Macrocells

      Page(s):
    3787-3795

    This paper describes a high-performance reconfigurable line memory macrocell for video signal processing ASICs. The macrocell features a three-transistor memory cell array with a divided word line structure for write word lines. The transistor size of the memory cell has been determined by analyzing access time to achieve a more than 50 MHz throughput rate for various aspect ratios. A testing circuit has been embedded in the macrocell, which offers the video-rate testing and high fault coverage with a minimum circuit count. Moreover the macrocell has high reconfigurability of word-length, bit-width and aspect ratio. A 1152 words8 bits line memory has been implemented experimentally using 1.0 µm CMOS technology. As a result, 60 MHz operation has been observed, allowing real time processing of HDTV signal. By applying the macrocells to HDTV system LSIs, the reconfigurability and usefulness of the testing circuits have been verified.

  • An Intelligent Cache Memory Chip Suitable for Logical Inference

    Kenichi YASUDA  Kiyohiro FURUTANI  Atsushi MAEDA  Shoichi WAKANO  Hiroshi NAKASHIMA  Yasutaka TAKEDA  Michihiro YAMADA  

     
    PAPER-System VLSI

      Page(s):
    3796-3802

    We have newly developed a VLSI intelligent cache memory chip which constitutes one processor element of a Parallel Inference Machine (PIM/m) system. This cache memory chip contains 610 k transistors including 80 kbits memory cells. The chip measures 14.47 mm14.84 mm and is fabricated by using 1.0µm CMOS double metal technology. The cache memory chip implements a hardware support called "Trail Buffer" which is suitable for the execution of logic programming languages. We have determined the cache memory size by practical simulation taking the relationship between the chip size and hitratio of the cache memory into consideration. The scan test method and the special commands to access every memory cell are applied to enhance the testability. This chip itself operates at a cycle time of 30 MHz. The typical power consumption is 2.5 W with a 5.5 V power supply at 16.7 MHz operation. With this cache memory chip, the CPU board of the PIM/m is now tuned for 16.7 MHz operation and has attained 1.5 MLIPS (logical inference per second), which is the highest performance as an inference machine in the world.

  • The Performance Evaluation of a 64 b Microprocessor with a Two-Level Cache

    Ryuichi YAMAGUCHI  Joel BONEY  Douglas DUSCHATKO  Tetsuya TANAKA  Jiro MIYAKE  Yoshito NISHIMICHI  Hisakazu EDAMATSU  Shigeru WATARI  Shigeo KUNINOBU  

     
    PAPER-System VLSI

      Page(s):
    3803-3809

    This paper describes the performance evaluation and its methodology of a two-level cache which is composed of internal caches of a 64-bit RISC microprocessor, an external cache, and a write buffer. The internal caches of this processor consist of a 6 Kbyte instruction cache and a 2 Kbyte date cache. We configured the two-level cache adding an external 256 Kbyte cache and a write buffer to this processor and analyzed the system performance. Cache behavior is analyzed by using direct measurement as well as software simulation. Direct measurement can shorten the time for a cache analysis compared with software simulation and, by using software simulation together with direct measurement, it becomes possible to model various configurations easily. We used SPEC benchmarks for the performance analysis. Results show that the external cache and the write buffer are effective in increasing the system performance.

  • VLSI Implementation of a Parallel Computer Network

    Katsuyuki KANEKO  Ichiro OKABAYASHI  Shingo KARINO  Yasuhiro NAKAKURA  Tetsuji KISHI  Manabu MIGITA  

     
    PAPER-System VLSI

      Page(s):
    3810-3818

    VLSI implementation of the network for a highly parallel computer system by three ASIC chip-set and some related results are described. The chip-set consists of two network-component chips, BMU and SRC, with which the crossbar-like network of arbitrary size can be realized, and a versatile network controller, TCU. New FIFO circuit, inter-chip pipelined data transmission scheme and novel cooperation method between computation and communication called 'FIFO emulation' were introduced in conjunction with communication overheads in parallel processing to enhance effective computing performance and actual transfer rate. The chip set employs synchronous 9 bit bus with peak date transmission rate of 20 MB/sec/channel. All chips are fabricated in 1.2 µm two layer AL N-well CMOS technology, containing 350 k, 25 k and 165 k transistors, respectively. Experiments shows that measured communication overhead is less than 30% where the ratio of computing in MFLOPS and communication in Mword/sec was 1, and 'FIFO emulation' scheme reduces communication time by 21% in the same case.

  • Design of a Matrix Multiply-Addition VLSI Processor for Robot Inverse Dynamics Computation

    Somchai KITTICHAIKOONKIT  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Dedicated Processors

      Page(s):
    3819-3828

    This paper proposes the design of a matrix multiply-addition VLSI processor (MMP) for minimum-delaytime inverse dynamics computation based on linear array architecture. The MMP mainly consists of four multiply-adders, thus performing 44 matrix multiply-additions with a regular data flow. The delay time becomes minimum based on the concept of "odd-even alternative computation". VLSI-oriented architecture which supports high-speed computation of the odd-even alternative computation both in the MMP level and in the array level, is achieved through the use of two types of the data-dependence graphs. By layout evaluation, it is demonstrated that the MMP can be easily implemented in a single chip. A linear array of MMPs is capable of performing inverse dynamics computation of any manipulator with minimum-delay time. The estimated performance with regard to the delay time is the highest in the architectures reported until now.

  • A Special-Purpose LSI for Inverse Kinematics Computation

    Michitaka KAMEYAMA  Takao MATSUMOTO  Hideki EGAMI  Tatsuo HIGUCHI  

     
    PAPER-Dedicated Processors

      Page(s):
    3829-3837

    This paper presents a special-purpose LSI chip for inverse kinematics computation of robot manipulators. It is shown that inverse kinematic solutions of kinematically simple manipulators can be systematically described with the two-dimensional (2-D) vector rotation. The chip is fabricated with the 1.5-µm CMOS gate array. The arithmetic unit on the chip is designed using the COordinate Rotation DIgital Computer (CORDIC) algorithms, and it performs six types of operations based on the 2-D vector rotation at high speed. Pipelining is used to enhance the operating ratio of the unit to 100%. The computation time of a special purpose processor which is composed of the chip and a few memory chips is approximately 50 µs for a typical six degree-of-freedom manipulator. Moreover, the chip can be used for various types of manipulators, and the software development is very easy.

  • Built-In Self-Test in a 24 Bit Floating Point Digital Signal Processor

    Narumi SAKASHITA  Hisako SAWAI  Eiichi TERAOKA  Toshiki FUJIYAMA  Tohru KENGAKU  Yukihiko SHIMAZU  Akiharu TADA  Takeshi TOKUDA  

     
    PAPER-Dedicated Processors

      Page(s):
    3838-3844

    A built-in self-test (BIST) based on a signature-analysis (one of the data compression techniques) has been implemented in a 24 bit floating point digital signal processor (DSP). By using only a single pair of linear feedback shift registers (LFSR's) and 253 words of instruction of the DSP, 95% of the functional blocks are self-tested. The number of test patterns is 35 million. It takes only 2.6 seconds for the test at fc26.7 MHz. The overhead of the BIST hardware is about 2.0% of the die size. By comparing the pass rate in a conventional function test to the BIST, nearly the same fault coverage is obtained. This result shows that the BIST is effective for VLSI processors, such as DSPs. By improving this method, manufacturing go/no-go tests without expensive test equipment will be possible.

  • A 400 MFLOPS FFT Processor VLSI Architecture

    Hiroshi MIYANAGA  Hironori YAMAUCHI  

     
    PAPER-Dedicated Processors

      Page(s):
    3845-3851

    We propose a single-chip 400-MFLOPS 2-D FFT processor VLSI architecture. This processor integrates 380,000 transistors in an area of 11.5811.58 mm2 using 0.8 µm CMOS technology with a typical machine cycle time of 25 ns, and executes 2n2n point 2-D FFT in real time, e.g., 256256 point FFT is executed in 14 ms. This excellent performance in terms of both speed and dynamic range makes the real-time processing practical for video as well as speech processing.

  • Architecture of a Floating-Point Butterfly Execution Unit in a 400-MFLOPS Processor VLSI and Its Implementation

    Hironori YAMAUCHI  Hiroshi MIYANAGA  

     
    PAPER-Dedicated Processors

      Page(s):
    3852-3860

    Some dedicated floating-point hardware arithmetic modules designed as processing elements for butterfly operations are described. They consist of Input Data Converters (IDC), Output Data Converters (ODC), and a 2's complementary 24-bit (16E8) floating-point Butterfly Execution Unit (BEU). The BEU executes the four multiplication and six additions/subtractions required for a complex butterfly operation in each 25-ns execution cycle by implementing four multipliers and four 3-input adders/subtracters. The arithmetic modules are fabricated using 0.8-µm CMOS technology. An overview of the hardware unit is presented with special attention given to the BEU for parallel pipelined processing. In addition, module design methodologies for hardware implementation and some sophisticated high-speed execution techniques for floating-point multiplication and addition are discussed.

  • Regular Section
  • Ni-Plated Anodic Alumina Film for Optical Polarizer

    Takashi SEKI  Mitsunori SAITO  Mitsunobu MIYAGI  

     
    PAPER-Opto-Electronics

      Page(s):
    3861-3866

    Porous anodic alumina films were studied to realize artificial metal-dielectric composites for polarizers. By the improved anodizing and electroplating procedures, nickel was deposited successfully as thick as50 µm in the pores. Optical transmittance of the film was measured in the wavelength range of 0.63-1.55 µm and a notable polarizing function was confirmed. The extinction ratio and the insertion loss were evaluated to be 41 dB and 1.3 dB, respectively, at the wavelength of 1.55 µm.

  • Temperature Characteristics of Short-Cavity AIGaAs/GaAs Surface Emitting Lasers

    Takemasa TAMANUKI  Kazuhiko HOUJOU  Fumio KOYAMA  Kenichi IGA  

     
    LETTER-Opto-Electronics

      Page(s):
    3867-3869

    We have measured the temperature range ΔTSM of one particular longitudinal mode operation in short-cavity surface emitting (SE) lasers. ΔTSM was increased up to 103 K by reducing the cavity length to be 4 µm. It is expected that a short-cavity SE laser shows stable single mode operation in a wide temperature range.

  • Effect of Annealing on Electrooptic Constant of the Undoped and the MgO-Doped Lithium Niobate Optical Waveguides

    Thi Thi LAY  Yukiko KONDO  Yoichi FUJII  

     
    LETTER-Opto-Electronics

      Page(s):
    3870-3872

    The effect of annealing on the electrooptic constant r33 in the proton-exchanged optical waveguides on three types of lithium niobate crystals: undoped, titanium-indiffused and MgO-doped, are investigated. The exact estimation shows that the deterioration of electrooptic constant is recovered by annealing; the best in undoped, then titanium-indiffused and the worst, MgO-doped.