1-14hit |
Yoshifumi KAWAMURA Naoya OKADA Yoshio MATSUDA Tetsuya MATSUMURA Hiroshi MAKINO Kazutami ARIMOTO
A Field Programmable Sequencer and Memory (FPSM), which is a programmable unit exclusively optimized for peripherals on a micro controller unit, is proposed. The FPSM functions as not only the peripherals but also the standard built-in memory. The FPSM provides easier programmability with a smaller area overhead, especially when compared with the FPGA. The FPSM is implemented on the FPGA and the programmability and performance for basic peripherals such as the 8 bit counter and 8 bit accuracy Pulse Width Modulation are emulated on the FPGA. Furthermore, the FPSM core with a 4K bit SRAM is fabricated in 0.18µm 5 metal CMOS process technology. The FPSM is an half the area of FPGA, its power consumption is less than one-fifth.
Hiroaki SUZUKI Hiroyuki KAWAI Hiroshi MAKINO Yoshio MATSUDA
A VLIW (Very Long Instruction Word) architecture with a new code compaction method has been proposed. For a 3D-geometry processor, we consider two types of 2-issue VLIW architectures, the floating-point execution accelerating VLIW (FP-VLIW) and the data-move enhancing VLIW (MV-VLIW) architectures, as expansions of a Single-Streaming Single Instruction, Multiple Data (SS-SIMD) architecture. To solve the code bloat problem which is common to VLIW architectures, the proposed method makes it possible to compact original codes into the VLIW codes by software tools and decompact the VLIW codes by a simple hardware decompactor composed of an instruction swap circuit on a chip. Speeds and code densities of the two VLIWs with the code compaction are compared to the SS-SIMD with the same instruction set and the same building blocks. The FP-VLIW shows the fastest speed performance in the evaluation results of the viewperf CDRS-03 benchmark programs. It is 36% faster than the SS-SIMD used as reference. The proposed compaction method keeps the 95% code density of the SS-SIMD. One test program shows that the code density of the MV-VLIW is higher than that of the SS-SIMD. This result demonstrates that the merit of compacting nops can be greater than the VLIW penalty. The FP-VLIW architecture with the code compaction achieves 1.36 times the speed performance without significant code-density deterioration.
Akira YAMADA Yasuhiro NUNOMURA Hiroaki SUZUKI Hisakazu SATO Niichi ITOH Tetsuya KAGEMOTO Hironobu ITO Takashi KURAFUJI Nobuharu YOSHIOKA Jingo NAKANISHI Hiromi NOTANI Rei AKIYAMA Atsushi IWABU Tadao YAMANAKA Hidehiro TAKATA Takeshi SHIBAGAKI Takahiko ARAKAWA Hiroshi MAKINO Osamu TOMISAWA Shuhei IWADE
A high-speed 32-bit RISC microcontroller has been developed. In order to realize high-speed operation with minimum hardware resource, we have developed new design and analysis methods such as a clock distribution, a bus-line layout, and an IR drop analysis. As a result, high-speed operation of 400 MHz has been achieved with power dissipation of 0.96 W at 1.8 V.
Hiromi NOTANI Masayuki KOYAMA Ryuji MANO Hiroshi MAKINO Yoshio MATSUDA Osamu TOMISAWA Shuhei IWADE
A 64-bit 100-MHz multimedia DSP core has been designed using 0.15-µ m CMOS technology. An improved Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit with a charge pump is adopted to suppress the standby leakage current. The dynamic active current of whole chip was simulated to optimize the size of a switch for a power supply control. The DSP core chip, which integrates 300-kgate Logic, 64-kbyte SRAM and charge pump circuit, has 8-µ A standby leakage current. The reduction rate is 1/250.
Hiroaki SUZUKI Hiroshi MAKINO Koichiro MASHIKO
This paper describes a new floating-point divider (FDIV), in which the key features of redundant binary circuits and an asynchronous clock scheme reduce the delay time and area penalty. The redundant binary representation of +1 = (1, 0), 0 = (0, 0), -1 = (0,1) is applied to the all mantissa division circuits. The simple and unified representation reduces circuit delay for the quotient determination. Additionally, the local clock generator circuit for the asynchronous clock scheme eliminates clock margin overhead. The generator circuit guarantees the worst delay-time operation by the feedback loop of the replica delay paths via a C-element. The internal iterative operation by the asynchronous scheme and the modified redundant-binary addition/subtraction circuit keep the area small. The architecture design avoids extra calculation time for the post processes, whose main role is to produce the floating-point status flags. The FDIV core using proposed technologies operates at 42. 1 ns with 0.35 µm CMOS technology and triple metal interconnections. The small core of 13.5 k transistors is laid-out in a 730µm 910 µm area.
Yasumasa TSUKAMOTO Tatsuya KUNIKIYO Koji NII Hiroshi MAKINO Shuhei IWADE Kiyoshi ISHIKAWA Yasuo INOUE Norihiko KOTANI
It is still an open problem to elucidate the scaling merits of an embedded SRAM with Low Operating Power (LOP) MOSFETs fabricated in 50, 70 and 100 nm CMOS technology nodes. Taking into account a realistic SRAM cell layout, we evaluated the parasitic capacitance of the bit line (BL) as well as the word line (WL) in each generation. By means of a 3-Dimensional (3D) interconnect simulator (Raphael), we focused on the scaling merit through a comparison of the simulated SRAM BL delay for each CMOS technology node. In this paper, we propose two kinds of original interconnect structure which modify ITRS (International Technology Roadmap for Semiconductors), and make it clear that the original interconnect structures with reduced gate overlap capacitance guarantee the scaling merits of SRAM cells fabricated with LOP MOSFETs in 50 and 70 nm CMOS technology nodes.
Hiroshi MAKINO Hiroaki SUZUKI Hiroyuki MORINAKA Yasunobu NAKASE Hirofumi SHINOHARA Koichiro MASHIKO Tadashi SUMI Yasutaka HORIBA
This paper describes the design of a high-speed 4-2 compressor for fast multipliers. Through the survey of the six kinds of representative conventional 4-2 compressor (RBA 1-3 and NBA 1-3) in both the redundant binary (RB) and the normal binary (NB) scheme, we extracted two problems that degrades the operating speed. The first is the use of multi-input complex gates and the second is the existence of transmission gates (TG) at the input and/or output stages. To solve these problems, we propose high-speed 4-2 compressors using the RB scheme, which we call the high-speed redundant binary adders (HSRBAs). Six kinds of HSRBAs, HSRBA 1-6, were derived by making the Boolean equations suitable for high-speed CMOS circuits. Among them, HSRBA2, HSRBA4 and HSRBA6 have no multi-input complex gate and input/output TG, and perform at a delay time of 0.89 ns which is the fastest of all 4-2 compressors. We investigated the logical relation between HSRBAs and conventional 4-2 compressors by analyzing the Boolean equations for each circuit. This investigation shows that all the conventional redundant binary adders RBA1-3 have the same logic structures as HSRBA2. We also showed the conventional normal binary adders NBA1-3 have the same logic structures as HSRBA1, HSRBA3 and HSRBA5, respectively. This implies all 4-2 compressors can be derived from the same equation regardless of RB or NB. We applied the HSRBA2 to a 5454-bit multiplier using 0.5-µm CMOS technology. The multiplication time at the supply voltage of 3.3 V was 8.8 ns. This is the fastest 5454-bit multiplier with 0.5-µm CMOS so far, and 83% of the speed improvement is due to the high speed 4-2 compressor.
Takahiro SHIMADA Hiromi NOTANI Yasunobu NAKASE Hiroshi MAKINO Shuhei IWADE
We proposed a push-pull output buffer that maintains the data transmission rate for lower supply voltages. It operates at an internal supply voltage (VDD) of 0.7-1.6 V and an interface supply voltage (VDDX) of 1.0-3.6 V. In low VDDX operation, the output buffer utilizes parasitic bipolar transistors instead of MOS transistors to maintain drivability. Furthermore forward body bias (FBB) control is provided for the level converter in low VDD operation. We fabricated a test chip with a standard 0.15 µm CMOS process. Measurement results indicate that the proposed output buffer achieves 200 Mbps operation at VDD of 0.7 V and VDDX of 1.0 V.
Hiroshi MAKINO Hiroaki SUZUKI Hiroyuki MORINAKA Yasunobu NAKASE Koichiro MASHIKO Tadashi SUMI
This paper presents a high speed 64-b floating point (FP) multiplier that has a useful function for computer graphics(CG). The critical path delay is minimized by using high speed logic gates and limiting the stage number of series transmission gates (TG's). The high speed redundant binary architecture is applied to the multiplication of significands. This FP multiplier has a special function of "CG multiplication" that directly multiplies a pixel data by an FP data. This multiplier was fabricated by 0.5 µm CMOS technology with triple-level metal of interconnection. The active area size is 4.25.1mm2.The operating cycle time is 3.5 ns at the supply voltage of 3.3 V, which corresponds to the frequency of 286 MHz, Implementation of CG multiplication increases the transistor count only 4%. Also, CG multiplication has no effect on the delay in the critical path.
Hisakazu SATO Yasuhiro NUNOMURA Niichi ITOH Koji NII Kanako YOSHIDA Hironobu ITO Jingo NAKANISHI Hidehiro TAKATA Yasunobu NAKASE Hiroshi MAKINO Akira YAMADA Takahiko ARAKAWA Toru SHIMIZU Yuichi HIRANO Takashi IPPOSHI Shuhei IWADE
A low-power microcontroller has been developed with 0.10 µm bulk compatible body-tied SOI technology. For this work, only two new masks are required. For the other layers, existing masks of a prior work developed with 0.18 µm bulk CMOS technology can be applied without any changes. With the SOI technology, the high-speed operation of over 600 MHz has been achieved at a supply voltage of 1.2 V, which is 1.5 times faster than prior work. Also, a five times improvement in the power-delay product has been achieved at a supply voltage 0.8 V. Moreover, the compatibility of the SOI technology with bulk CMOS has been verified, because all circuit blocks of the chip, including logic, memory, analog circuit, and PLL, are completely functional, even though only two new masks are used.
Masako FUJII Koji NII Hiroshi MAKINO Shigeki OHBAYASHI Motoshige IGARASHI Takeshi KAWAMURA Miho YOKOTA Nobuhiro TSUDA Tomoaki YOSHIZAWA Toshikazu TSUTSUI Naohiko TAKESHITA Naofumi MURATA Tomohiro TANAKA Takanari FUJIWARA Kyoko ASAHINA Masakazu OKADA Kazuo TOMITA Masahiko TAKEUCHI Shigehisa YAMAMOTO Hiromitsu SUGIMOTO Hirofumi SHINOHARA
We propose a new large-scale logic test element group (TEG), called a flip-flop RAM (FF-RAM), to improve the total process quality before and during initial mass production. It is designed to be as convenient as an SRAM for measurement and to imitate a logic LSI. We implemented a 10 Mgates FF-RAM using our 65-nm CMOS process. The FF-RAM enables us to make fail-bit maps (FBM) of logic cells because of its cell array structure as an SRAM. An FF-RAM has an additional structure to detect the open and short failure of upper metal layers. The test results show that it can detect failure locations and layers effortlessly using FBMs. We measured and analyzed it for both the cell arrays and the upper metal layers. Their results provided many important clues to improve our processes. We also measured the neutron-induced soft error rate (SER) of FF-RAM, which is becoming a serious problem as transistors become smaller. We compared the results of the neutron-induced soft error rate to those of previous generations: 180 nm, 130 nm, and 90 nm. Because of this TEG, we can considerably shorten the development period for advanced CMOS technology.
Minoru NODA Hiroshi MATSUOKA Norio HIGASHISAKA Masaaki SHIMADA Hiroshi MAKINO Shuichi MATSUE Yasuo MITSUI Kazuo NISHITANI Akiharu TADA
Air-bridge metal interconnection technology is used for upper level power supply line interconnections in GaAs LSI's to reduce the signal propagation delay time. This technology reduces both parasitic capacitance between the signal line and the power supply line, and propagation delay in the signal line to about 10% and about 50%, respectively, compared to conventional 3-level interconnections without air-bridges. Under standard load conditions (FI=FO=2, length of load line=2 mm), the air-bridge technique leads to gate propagation delays which are about 60% of those in conventional interconnections. We fabricated 2.1-k gate Gate Arrays and 4-kb SRAM's using the air-bridge structure to interconnect power supply lines. For a Gate Array with 0.7 µm gate Buried P-layer Lightly Doped Drain (BPLDD) FET's, the typical gate propagation delay under standard load conditions was about 110 ps with a dissipation power of 1.4 mW/gate. SRAM's with 05 µm gate BPLDD's had typical access time (tacc) of 1.5 ns with a dissipation power of 700 mW/chip.
Hiroyuki MORINAKA Hiroshi MAKINO Yasunobu NAKASE Hiroaki SUZUKI Koichiro MASHIKO Tadashi SUMI
We present a 64-b adder having a 2.6-ns delay time at 3.3 V power supply within 0.27 mm2 using 0.5-µm CMOS technology. We derived our adder design from architectural level considerations. The considerations include not only the gate intrinsic delay but also the wiring delay and the gate capacitance delay. As a result, a 64-b adder, (56-b Carry Look-ahead Adder(CLA) +8-b Carry Select Adder (CSA)), was designed. In this design, a new carry select scheme called Modified Carry Select (MCS) is also proposed.
Hiroshi MAKINO Shunsuke KAMIJO
ITS R&D includes wide variety of research area such as mechanical engineering, road engineering, traffic engineering, information and communication engineering, and electrical engineering. In spite of initiatives across the variety of engineering is essential to solve the problems of practical social systems, it is difficult to collaborate among engineering. Based on the joint research of the Japan Society of Civil Engineers and the Institute of Electrical Engineers held at the Great East Japan Earthquake, this paper discusses about necessity of collaboration among academies on ITS R&D. International collaboration is also important for ITS R&D. Asian countries could share the same problems and solutions, since many of mega cities exist in Asia region and they suffers from heavy traffics. Therefore, we need to discuss the common solution to our problems.