Hang Liu Fei Wu
Keiji GOTO Toru KAWANO Ryohei NAKAMURA
Takahiro SASAKI Yukihiro KAMIYA
Xiang XIONG Wen LI Xiaohua TAN Yusheng HU
Anton WIDARTA
Hiroshi OKADA Mao FUKINAKA Yoshiki AKIRA
Shun-ichiro Ohmi
Tohgo HOSODA Kazuyuki SAITO
Shohei Matsuhara Kazuyuki Saito Tomoyuki Tajima Aditya Rakhmadi Yoshiki Watanabe Nobuyoshi Takeshita
Koji Abe Mikiya Kuzutani Satoki Furuya Jose A. Piedra-Lorenzana Takeshi Hizawa Yasuhiko Ishikawa
Yihan ZHU Takashi OHSAWA
Shengbao YU Fanze MENG Yihan SHEN Yuzhu HAO Haigen ZHOU
Ryo KUMAGAI Ryosuke SUGA Tomoki UWANO
Jun SONODA Kazusa NAKAMICHI
Kaiji Owaki Yusuke Kanda Hideaki Kimura
Takuya FUJIMOTO
Yuji Wada
Fuyuki Kihara Chihiro Matsui Ken Takeuchi
Keito YUASA Michihiro IDE Sena KATO Kenichi OKADA Atsushi SHIRANE
Tomoo Ushio Yuuki Wada Syo Yoshida
Futoshi KUROKI
Jun FURUTA Shotaro SUGITANI Ryuichi NAKAJIMA Takafumi ITO Kazutoshi KOBAYASHI
Yuya Ichikawa Ayumu Yamada Naoko Misawa Chihiro Matsui Ken Takeuchi
Ayumu Yamada Zhiyuan Huang Naoko Misawa Chihiro Matsui Ken Takeuchi
Yoshinori ITOTAGAWA Koma ATSUMI Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
Zhibo CAO Pengfei HAN Hongming LYU
Takuya SAKAMOTO Itsuki IWATA Toshiki MINAMI Takuya MATSUMOTO
Koji YAMANAKA Kazuhiro IYOMASA Takumi SUGITANI Eigo KUWATA Shintaro SHINJO
Minoru MIZUTANI Takashi OHIRA
Katsumi KAWAI Naoki SHINOHARA Tomohiko MITANI
Baku TAKAHARA Tomohiko MITANI Naoki SHINOHARA
Akihiko ISHIWATA Yasumasa NAKA Masaya TAMURA
Atsushi Fukuda Hiroto Yamamoto Junya Matsudaira Sumire Aoki Yasunori Suzuki
Ting DING Jiandong ZHU Jing YANG Xingmeng JIANG Chengcheng LIU
Fan Liu Zhewang Ma Masataka Ohira Dongchun Qiao Guosheng Pu Masaru Ichikawa
Ludovico MINATI
Minoru Fujishima
Hyunuk AHN Akito IGUCHI Keita MORIMOTO Yasuhide TSUJI
Kensei ITAYA Ryosuke OZAKI Tsuneki YAMASAKI
Akira KAWAHARA Jun SHIBAYAMA Kazuhiro FUJITA Junji YAMAUCHI Hisamatsu NAKANO
Seiya Kishimoto Ryoya Ogino Kenta Arase Shinichiro Ohnuki
Yasuo OHTERA
Tomohiro Kumaki Akihiko Hirata Tubasa Saijo Yuma Kawamoto Tadao Nagatsuma Osamu Kagaya
Haonan CHEN Akito IGUCHI Yasuhide TSUJI
Keiji GOTO Toru KAWANO Munetoshi IWAKIRI Tsubasa KAWAKAMI Kazuki NAKAZAWA
Yoji NISHIO Noriaki OKA Shigeru TAKAHASHI Manabu SHIBATA
A mixed BiCMOS/CMOS channelless gate array family with 3-metal-layer wiring using a 5 V version, 0.5 µm BiCMOS technology is discussed. The speed and power performance of CMOS gates are superior to those of BiCMOS gates for light load capacitance. The power-delay product of CMOS gates at light load is 50% less than that of BiCMOS gates. Therefore, by using CMOS and BiCMOS gates selectively according to the weight of the capacitance load, the performance of the BiCMOS gate array is enhanced. Then, a new mixed BiCMOS/CMOS basic cell structure which can be used as BiCMOS or CMOS gates, which go to the wiring channels, was developed. The area efficiency of the developed basic cell is 16% better than that of the conventional basic cell, as got from design automation experience, etc. The wiring method of the power supply reinforcement lines of the third metal layer in a large chip was examined from the viewpoint of the number of useful basic cells. As a result, by locating the reinforcement lines at every basic cell, the number of useful basic cells is about 14% more than that of another method in which the reinforcement lines are located at certain intervals of basic cells. Propagation delay time of the 2-input NAND is 190 ps at fan out 10 load. Under a light load, a pure CMOS NAND is faster, achieving a 140 ps gate delay at fan out 2 load. This gate array family can be applied to high speed processors.
Fumiyasu ASAI Shinji KOMORI Toshiyuki TAMURA Hisakazu SATO Hidehiro TAKATA Yoshihiro SEGUCHI Takeshi TOKUDA Hiroaki TERADA
This paper details a unique VLSI design scheme which employs self-timed circuits. A 32-bit 50-MFLOPS data-driven microprocessor has been designed using a self-timed clocking scheme. This high performance data-driven microprocessor with sophisticated functions has been designed by a combination of several kinds of self-timed components. All functional blocks in the microprocessor are driven by self-timed clocks. The microprocessor integrates 700,000 devices in a 14.65 mm
Masafumi TAKAHASHI Yasuo YAMADA Emi KANEKO Shinichi YOSHIOKA Haruyuki TAGO
A 10-MIPS peak performance single-chip MCU (Micro Controller Unit) core has been developed for real time applications. The following features are implemented to improve cost-effectiveness and the worst interrupt response: (1) large-scale on-chip register with overlapping register windows, (2) the mechanism of receiving exception request with concurrent execution of another instruction, (3) load-store architecture, (4) optimized instruction pipelining, and (5) two 32-bit internal and a 16-bit external buses. It delivers about 5-MIPS average performance in some benchmark programs. The worst overall interrupt response time, which is defined to be the one from receiving an interrupt request to saving the general registers in the invoked interrupt routine, is measured to be 2.38 µs, which is about three fold improvements over a commercial 32-bit MCU. The prototype contains only 15,000 gates, and is cost-effective for 16/32-bit single-chip MCU.
Takehisa HAYASHI Toshio DOI Mikio YAMAGISHI Kazuo KOIDE Akira ISHIYAMA Masataka HIRAMATSU Akira YAMAGIWA
A 64 b CMOS mainframe execution unit macrocell with error detecting circuits is proposed. The conventional techniques to maintain high reliability have been the parity checking and the duplication of the ALU (Arithmetic Logic Unit). However, the required time for generating the parity from the sum output of the ALU has been undesirable for high-speed operation. In order to achieve a short ALU delay time, a parity predicting logic structure is newly adopted. By utilizing this structure, a one-bit-error detecting function is integrated without duplicating the every ALU circuit. A novel CMOS precharged circuit is also developed to shorten the time required to precharge the whole circuit. When the number of circuit stages is reduced, the precharge time as well as the delay time restricts the ALU cycle time. This new circuitry solves the precharging time accumulation problem in the conventional circuits. A 64 b BCD ALU adopting this technology has been designed and fabricated. The parity predict architecture and the high-speed-precharge circuit have been effective in reducing the delay time by 23% and the precharge time by 42%. A 30% faster cycle time has been achieved with a small increase (4%) in ALU area. The execution unit macrocell, which includes the ALU described above, contains 45 k transistors and it's area is 4.3 mm
Masao MIZUKAMI Yasuo MIKAMI Osamu MATSUBARA Yoichi SATOH Koichi SUDOH
This paper describes circuit techniques of a 5 ns, 4 kw
Tetsuya MATSUMURA Masahiko YOSHIMOTO Atsushi MAEDA Yasutaka HORIBA
This paper describes a high-performance reconfigurable line memory macrocell for video signal processing ASICs. The macrocell features a three-transistor memory cell array with a divided word line structure for write word lines. The transistor size of the memory cell has been determined by analyzing access time to achieve a more than 50 MHz throughput rate for various aspect ratios. A testing circuit has been embedded in the macrocell, which offers the video-rate testing and high fault coverage with a minimum circuit count. Moreover the macrocell has high reconfigurability of word-length, bit-width and aspect ratio. A 1152 words
Kenichi YASUDA Kiyohiro FURUTANI Atsushi MAEDA Shoichi WAKANO Hiroshi NAKASHIMA Yasutaka TAKEDA Michihiro YAMADA
We have newly developed a VLSI intelligent cache memory chip which constitutes one processor element of a Parallel Inference Machine (PIM/m) system. This cache memory chip contains 610 k transistors including 80 kbits memory cells. The chip measures 14.47 mm
Ryuichi YAMAGUCHI Joel BONEY Douglas DUSCHATKO Tetsuya TANAKA Jiro MIYAKE Yoshito NISHIMICHI Hisakazu EDAMATSU Shigeru WATARI Shigeo KUNINOBU
This paper describes the performance evaluation and its methodology of a two-level cache which is composed of internal caches of a 64-bit RISC microprocessor, an external cache, and a write buffer. The internal caches of this processor consist of a 6 Kbyte instruction cache and a 2 Kbyte date cache. We configured the two-level cache adding an external 256 Kbyte cache and a write buffer to this processor and analyzed the system performance. Cache behavior is analyzed by using direct measurement as well as software simulation. Direct measurement can shorten the time for a cache analysis compared with software simulation and, by using software simulation together with direct measurement, it becomes possible to model various configurations easily. We used SPEC benchmarks for the performance analysis. Results show that the external cache and the write buffer are effective in increasing the system performance.
Katsuyuki KANEKO Ichiro OKABAYASHI Shingo KARINO Yasuhiro NAKAKURA Tetsuji KISHI Manabu MIGITA
VLSI implementation of the network for a highly parallel computer system by three ASIC chip-set and some related results are described. The chip-set consists of two network-component chips, BMU and SRC, with which the crossbar-like network of arbitrary size can be realized, and a versatile network controller, TCU. New FIFO circuit, inter-chip pipelined data transmission scheme and novel cooperation method between computation and communication called 'FIFO emulation' were introduced in conjunction with communication overheads in parallel processing to enhance effective computing performance and actual transfer rate. The chip set employs synchronous 9 bit bus with peak date transmission rate of 20 MB/sec/channel. All chips are fabricated in 1.2 µm two layer AL N-well CMOS technology, containing 350 k, 25 k and 165 k transistors, respectively. Experiments shows that measured communication overhead is less than 30% where the ratio of computing in MFLOPS and communication in Mword/sec was 1, and 'FIFO emulation' scheme reduces communication time by 21% in the same case.
Somchai KITTICHAIKOONKIT Michitaka KAMEYAMA Tatsuo HIGUCHI
This paper proposes the design of a matrix multiply-addition VLSI processor (MMP) for minimum-delaytime inverse dynamics computation based on linear array architecture. The MMP mainly consists of four multiply-adders, thus performing 4
Michitaka KAMEYAMA Takao MATSUMOTO Hideki EGAMI Tatsuo HIGUCHI
This paper presents a special-purpose LSI chip for inverse kinematics computation of robot manipulators. It is shown that inverse kinematic solutions of kinematically simple manipulators can be systematically described with the two-dimensional (2-D) vector rotation. The chip is fabricated with the 1.5-µm CMOS gate array. The arithmetic unit on the chip is designed using the COordinate Rotation DIgital Computer (CORDIC) algorithms, and it performs six types of operations based on the 2-D vector rotation at high speed. Pipelining is used to enhance the operating ratio of the unit to 100%. The computation time of a special purpose processor which is composed of the chip and a few memory chips is approximately 50 µs for a typical six degree-of-freedom manipulator. Moreover, the chip can be used for various types of manipulators, and the software development is very easy.
Narumi SAKASHITA Hisako SAWAI Eiichi TERAOKA Toshiki FUJIYAMA Tohru KENGAKU Yukihiko SHIMAZU Akiharu TADA Takeshi TOKUDA
A built-in self-test (BIST) based on a signature-analysis (one of the data compression techniques) has been implemented in a 24 bit floating point digital signal processor (DSP). By using only a single pair of linear feedback shift registers (LFSR's) and 253 words of instruction of the DSP, 95% of the functional blocks are self-tested. The number of test patterns is 35 million. It takes only 2.6 seconds for the test at fc
Hiroshi MIYANAGA Hironori YAMAUCHI
We propose a single-chip 400-MFLOPS 2-D FFT processor VLSI architecture. This processor integrates 380,000 transistors in an area of 11.58
Hironori YAMAUCHI Hiroshi MIYANAGA
Some dedicated floating-point hardware arithmetic modules designed as processing elements for butterfly operations are described. They consist of Input Data Converters (IDC), Output Data Converters (ODC), and a 2's complementary 24-bit (16E8) floating-point Butterfly Execution Unit (BEU). The BEU executes the four multiplication and six additions/subtractions required for a complex butterfly operation in each 25-ns execution cycle by implementing four multipliers and four 3-input adders/subtracters. The arithmetic modules are fabricated using 0.8-µm CMOS technology. An overview of the hardware unit is presented with special attention given to the BEU for parallel pipelined processing. In addition, module design methodologies for hardware implementation and some sophisticated high-speed execution techniques for floating-point multiplication and addition are discussed.
Takashi SEKI Mitsunori SAITO Mitsunobu MIYAGI
Porous anodic alumina films were studied to realize artificial metal-dielectric composites for polarizers. By the improved anodizing and electroplating procedures, nickel was deposited successfully as thick as
Takemasa TAMANUKI Kazuhiko HOUJOU Fumio KOYAMA Kenichi IGA
We have measured the temperature range ΔTSM of one particular longitudinal mode operation in short-cavity surface emitting (SE) lasers. ΔTSM was increased up to 103 K by reducing the cavity length to be 4 µm. It is expected that a short-cavity SE laser shows stable single mode operation in a wide temperature range.
Thi Thi LAY Yukiko KONDO Yoichi FUJII
The effect of annealing on the electrooptic constant r33 in the proton-exchanged optical waveguides on three types of lithium niobate crystals: undoped, titanium-indiffused and MgO-doped, are investigated. The exact estimation shows that the deterioration of electrooptic constant is recovered by annealing; the best in undoped, then titanium-indiffused and the worst, MgO-doped.