Baoquan ZHONG Zhiqun CHENG Minshi JIA Bingxin LI Kun WANG Zhenghao YANG Zheming ZHU
Kazuya TADA
Suguru KURATOMI Satoshi USUI Yoko TATEWAKI Hiroaki USUI
Yoshihiro NAKA Masahiko NISHIMOTO Mitsuhiro YOKOTA
Hiroki Hoshino Kentaro Kusama Takayuki Arai
Tsuneki YAMASAKI
Kengo SUGAHARA
Cuong Manh BUI Hiroshi SHIRAI
Hiroyuki DEGUCHI Masataka OHIRA Mikio TSUJI
Hiroto Tochigi Masakazu Nakatani Ken-ichi Aoshima Mayumi Kawana Yuta Yamaguchi Kenji Machida Nobuhiko Funabashi Hideo Fujikake
Yuki Imamura Daiki Fujii Yuki Enomoto Yuichi Ueno Yosei Shibata Munehiro Kimura
Keiya IMORI Junya SEKIKAWA
Naoki KANDA Junya SEKIKAWA
Yongzhe Wei Zhongyuan Zhou Zhicheng Xue Shunyu Yao Haichun Wang
Mio TANIGUCHI Akito IGUCHI Yasuhide TSUJI
Kouji SHIBATA Masaki KOBAYASHI
Zhi Earn TAN Kenjiro MATSUMOTO Masaya TAKAGI Hiromasa SAEKI Masaya TAMURA
Misato ONISHI Kazuhiro YAMAGUCHI Yuji SAKAMOTO
Koya TANIKAWA Shun FUJII Soma KOGURE Shuya TANAKA Shun TASAKA Koshiro WADA Satoki KAWANISHI Takasumi TANABE
Shotaro SUGITANI Ryuichi NAKAJIMA Keita YOSHIDA Jun FURUTA Kazutoshi KOBAYASHI
Ryosuke Ichikawa Takumi Watanabe Hiroki Takatsuka Shiro Suyama Hirotsugu Yamamoto
Chan-Liang Wu Chih-Wen Lu
Umer FAROOQ Masayuki MORI Koichi MAEZAWA
Ryo ITO Sumio SUGISAKI Toshiyuki KAWAHARAMURA Tokiyoshi MATSUDA Hidenori KAWANISHI Mutsumi KIMURA
Paul Cain
Arie SETIAWAN Shu SATO Naruto YONEMOTO Hitoshi NOHMI Hiroshi MURATA
Seiichiro Izawa
Hang Liu Fei Wu
Keiji GOTO Toru KAWANO Ryohei NAKAMURA
Takahiro SASAKI Yukihiro KAMIYA
Xiang XIONG Wen LI Xiaohua TAN Yusheng HU
Tohgo HOSODA Kazuyuki SAITO
Yihan ZHU Takashi OHSAWA
Shengbao YU Fanze MENG Yihan SHEN Yuzhu HAO Haigen ZHOU
This paper presents media processor architectures for automotive applications. Media processing applications with their requirements for LSI implementations are first described for vision based driver assistance as well as graphical user interface for car navigation using 3D graphics. Then, parallel processing architectures for vision and graphics in these applications are reviewed with their performance and cost. After that, future trends of automotive media processing such as integration of vision and 3D graphics functions are shown with their applications and the required performance. Moreover, parallel processing architectures are discussed for the integration of vision and graphics. Finally, an prospect of a next-generation media processing LSI for automotives is provided.
This paper reviews and discusses devices, circuits, and signal processing techniques for CMOS imaging SoC's based on column-parallel processing architecture. The pinned photodiode technology improves the noise characteristics at the device level to be comparable to CCD image sensors and as a result, low-noise design in CMOS image sensors has been shifted to the reduction of noise at the circuit level. Techniques for reducing the circuit noise are discussed. The performance of the imaging SoC's greatly depends on that of the analog-to-digital converter (ADC) used at the column. Three possible architectures of the column-parallel ADC are reviewed and their advantage and disadvantage are discussed. Finally, a few applications of the device and circuit techniques and the column-parallel processing architecture are described.
Junichi AKITA Hiroaki TAKAGI Keisuke DOUMAE Akio KITAGAWA Masashi TODA Takeshi NAGASAKI Toshio KAWASHIMA
Although the line-of-sight (LoS) is expected to be useful as input methodology for computer systems, the application area of the conventional LoS detection system composed of video camera and image processor is restricted in the specialized area, such as academic research, due to its large size and high cost. There is a rapid eye motion, so called 'saccade' in our eye motion, which is expected to be useful for various applications. Because of the saccade's very high speed, it is impossible to track the saccade without using high speed camera. The authors have been proposing the high speed vision chip for LoS detection including saccade based on the pixel parallel processing architecture, however, its resolution is very low for the large size of its pixel. In this paper, we propose and discuss an architecture of the vision chip for LoS detection including saccade based on column-parallel processing manner for increasing the resolution with keeping high processing speed.
Atsushi IWASHITA Takashi KOMURO Masatoshi ISHIKAWA
A 128
Kenji IDE Ryusuke KAWAHARA Satoshi SHIMIZU Takayuki HAMAMOTO
We have investigated real-time object tracking using a wide view imaging system. For the system, we have designed and fabricated new smart image sensor with four functions effective in wide view imaging, such as a random access function. In this system, eight smart sensors and an octagonal mirror are used and each image obtained by the sensors is equivalent to a partial image of the wide view. In addition, by using an FPGA for processing, the circuits in this system can be scaled down and a panoramic image can be obtained in real time. For object tracking using this system, the object-detection method based on background subtraction is used. When moving objects are detected in the panoramic image, the objects are constantly displayed on the monitor at higher resolution in real time. In this paper, we describe the random access image sensor and show some results obtained using this sensor. In addition, we describe the wide view imaging system using eight sensors. Furthermore, we explain the method of object tracking in this system and show the results of real-time multipl-object tracking.
Satoshi SHIGEMATSU Hiroki MORIMURA Toshishige SHIMAMURA Takahiro HATANO Namiko IKEDA Yukio OKAZAKI Katsuyuki MACHIDA Mamoru NAKANISHI
This paper describes logic and analog test schemes that improve the testability of a pixel-parallel fingerprint identification circuit. The pixel contains a processing circuit and a capacitive fingerprint sensor circuit. For the logic test, we propose a test method using a pseudo scan circuit to check the processing circuits of all pixels simultaneously. In the analog test, the sensor circuit employs dummy capacitance to mimic the state of a finger touching the chip. This enables an evaluation of the sensitivity of all sensor circuits on logical LSI tester without touching the chip with a finger. To check the effectiveness of the schemes, we applied them to a pixel array in a fingerprint identification LSI. The pseudo scan circuit achieved a 100% failure-detection rate for the processing circuit. The analog test determines that the sensitivities of the sensor circuit in all pixels are in the proper range. The results of the tests confirmed that the proposed schemes can completely detect defects in the circuits. Thus, the schemes will pave the way to logic and analog tests of chips integrating highly functional devices stacked on a LSI.
Masahiro FUKUI Sayaka IWAKOSHI Tatsuya KOYAGI
Accompanying with the rapid popularization of portable equipments, it becomes very important to make the battery lifetime longer without increasing the battery size. Especially toward the ubiquitous computing age, long battery lifetime in a tight size limitation will be highly demanded. It will be invaluable for intelligent sensor for cars and robots, too. This paper proposes an algorithm to optimize the battery lifetime in the restriction of total size, by simultaneous analysis of operation condition of battery, buck converter, and LSI. We discuss accurate design models of those components at the same time.
Xu ZHANG Xiaohong JIANG Susumu HORIGUCHI
The evolution of VLSI chips towards larger die size, smaller feature size and faster clock speed makes the clock distribution an increasingly important issue. In this paper, we propose a new clock distribution network (CDN), namely Variant X-Tree, based on the idea of X-Architecture proposed recently for efficient wiring within VLSI chips. The Variant X-Tree CDN keeps the nice properties of equal-clock-path and symmetric structure of the typical H-Tree CDN, but results in both a lower maximal clock delay and a lower clock skew than its H-Tree counterpart, as verified by an extensive simulation study that incorporates simultaneously the effects of process variations and on-chip inductance. We also propose a closed-form statistical models for evaluating the skew and delay of the Variant X-Tree CDN. The comparison between the theoretical results and the simulation results indicates that the proposed statistical models can be used to efficiently and rapidly evaluate the performance of the variant X-Tree CDNs.
Kazutoshi KOBAYASHI Kazuya KATSUKI Manabu KOTANI Yuuri SUGIHARA Yohei KUME Hidetoshi ONODERA
We have fabricated a LUT-based FPGA device with functionalities measuring within-die variations in a 90 nm process. Variations are measured using ring oscillators implemented as a configuration of the FPGA. Random variations are dominant in a 48
Hiroki SHIMANO Fukashi MORISHITA Katsumi DOSAKA Kazutami ARIMOTO
The advanced-DFM (Design For Manufacturability) RAM provides the solution for the limitation of SRAM voltage scaling down and the countermeasure of the process fluctuations. The characteristics of this RAM are the voltage scalability (@0.6 V operation) with wide operating margin and the reliability of long data retention time. The memory cell consists of 2 Cell/bit with the complementary dynamic memory operation and has the 1 Cell/bit test mode for the accelerated screening against the marginal cells. The GND bitline pre-charge sensing scheme and SSW (Sense Synchronized Write) peripheral circuit technologies are also adopted for the low voltage and DFV (Dynamic Frequency and Voltage) controllable SoC which will be strongly required from the many kinds of applications. This RAM supports the DFM functions with both good cell/bit for advanced process technologies and the voltage scalable SoC memory platform.
Tadahiko SUGIBAYASHI Takeshi HONDA Noboru SAKIMURA Shuichi TAHARA Naoki KASAI
Apart from magnetic random access memories (MRAM), nonvolatile memories cannot be used without causing fatigue. As the use of MRAMs can solve fatigue problems, MRAMs have a large potential to open up large new markets. The manufacturing cost of LSIs cannot be reduced while they have not been produced massively. To increase the size of the MRAM market, new applications, in which MRAMs create added value, are needed. A demo system that models a drive recorder was developed to introduce the novel features of MRAMs, and a 4-Mb MRAM was developed to be used in the demo system.
Shoichiro KAWASHIMA Isao FUKUSHI Keizo MORITA Ken-ichi NAKABAYASHI Mitsuharu NAKAZAWA Kazuaki YAMANE Tomohisa HIRAYAMA Toru ENDO
A robust 1T1C FeRAM sensing technique is demonstrated that employs both word base access and reference level generation architecture to track the thermal history of the cells by utilizing a Feedback inverter Input Push-down (FIP) method for a Bit line Ground Sensing (BGS) pre-amplifier and a self-timing latch Sense Amplifier (SA) which is immune to increasing non-switching charges due to thermal depolarization or imprint of ferroelectric capacitor. The word base access unit consists of one 2T2C cell that stores 0/1 data and also generates '0' and '1' reference levels by which other 1T1C signals are compared. A 0.18-µm CMOS 3-V 1-Mbit device was qualified by a 250
Yasuhiro MORITA Hidehiro FUJIWARA Hiroki NOGUCHI Yusuke IGUCHI Koji NII Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper shows that an 8T SRAM cell is superior to a 6T cell in terms of cell area in a future process. At a 65-nm node and later, the 6T cell comprised of the minimum-channel-length transistors cannot make the minimum area because of threshold-voltage variation. In contrast, the 8T cell can employ the optimized transistors and achieves the minimum area even if it is used as a single-port SRAM. In a 32-nm process, the 8T-cell area is smaller than the 6T cell by 14.6% at a supply voltage of 0.8 V. We also discuss the area and access time comparisons between the 6T-SRAM and 8T-SRAM macros.
Hidehiro TOYODA Shinji NISHIMURA Michitaka OKUNO Matsuaki TERADA
A high-speed physical-layer architecture for next-generation higher-speed Ethernet for VSR and backplane applications was developed. VSR and backplane networks provide 100-Gb/s data transmission in "mega data centers" and blade servers, which have new and broad potential markets of LAN technologies. It supports 100-Gb/s-throughput, high-reliability, and low-latency data transmission, making it well suited to VSR and backplane applications for intra-building and intra-cabinet networks. Its links comprise ten 10-Gb/s high-speed serial lanes. Payload data are transmitted by ribbon fiber cables for very short reach and by copper channels for the backplane board. Ten lanes convey 320-bit data synchronously (32 bits
Qi WANG Kazunori SHIMIZU Takeshi IKENAGA Satoshi GOTO
In this paper we introduce an area and power efficient fully-parallel LDPC decoder design, which keeps the BER performance while consuming less hardware resources and lower power compared with conventional decoders. For this decoder, we firstly propose two improved simplified min-sum algorithms, which enable the decoder to reduce the hardware implementation complexity and area: hardware consumption of check operation module is reduced by 40%, while achieving a negligible performance loss compared with the general min-sum algorithm. To reduce the power dissipation of the decoder, we also proposed a power-saved strategy, according to which the message evolution halts as the parity-check condition is satisfied. This strategy reduces more than 50% power under good channel condition. The synthesis result in 0.18 µm CMOS technology shows our decoder based on (648,540) irregular LDPC code of WLAN (802.11n) protocol achieves 810 [Mbps] throughput with 283 [mW] power consumption.
Mariko SAKAMOTO Akira KATSUNO Go SUGIZAKI Toshio YOSHIDA Aiichiro INOUE Koji INOUE Kazuaki MURAKAMI
Broadcast and synchronization techniques are used for cache coherence control in conventional larger scale snoop-based SMP systems. The penalty for synchronization is directly proportional to system size. Meanwhile, advances in LSI technology now enable placing a memory controller on a CPU die. The latency to access directly linked memory is drastically reduced by an on-die controller. Developing an enterprise server system with these CPUs allows us an opportunity to achieve higher performance. Though the penalty of synchronization is counted whenever a cache miss occurs, it is necessary to improve the coherence method to receive the full benefit of this effect. In this paper, we demonstrate a coherence directory organization that fits into DSM enterprise server systems. Originally, a directory-based method was adopted in high performance computing systems because of its huge scalability in comparison with snoop-based method. Though directory capacity miss and long directory access latency are the major problems of this method, the relaxed scalability requirement of enterprise servers is advantageous to us to solve these problems along with an advanced LSI technology. Our proposed directory solves both problems by implementing a full bit vector level map of the coherence directory on an LSI chip. Our experimental results validate that a system controlled by our proposed directory can surpass a snoop-based system in performance even without applying data localization optimization to an online transaction processing (OLTP) workload.
Makoto SUGIHARA Tohru ISHIHARA Kazuaki MURAKAMI
This paper proposes a soft-error model for accurately estimating reliability of a computer system at the architectural level within reasonable computation time. The architectural-level soft-error model identifies which part of memory modules are utilized temporally and spatially and which single event upsets (SEUs) are critical to the program execution of the computer system at the cycle accurate instruction set simulation (ISS) level. The soft-error model is capable of estimating reliability of a computer system that has several memory hierarchies with it and finding which memory module is vulnerable in the computer system. Reliability estimation helps system designers apply reliable design techniques to vulnerable part of their design. The experimental results have shown that the usage of the soft-error model achieved more accurate reliability estimation than conventional approaches. The experimental results demonstrate that reliability of computer systems depends on not only soft error rates (SERs) of memories but also the behavior of software running in computer systems.
Makoto ISHIKAWA George SAIKALIS Shigeru OHO
We review practical case studies of a developing method of highly reliable real-time embedded control systems using a CPU model-based hardware/software co-simulation. We take an approach that enables us to fully simulate a virtual mechanical control system including a mechatronics plant, microcontroller hardware, and object code level software. This full virtual system approach simulates control system behavior, especially that of the microcontroller hardware and software. It enables design space exploration of microarchitecture, control design validation, robustness evaluation of the system, software optimization before components design. It also avoids potential problems. The advantage of this work is that it comprises all the components in a typical control system, enabling the designers to analyze effects from different domains, for example mechanical analysis of behavior due to differences in controller microarchitecture. To further improve system design, evaluation and analysis, we implemented an integrated behavior analyzer in the development environment. This analyzer can graphically display the processor behavior during the simulation without affecting simulation results such as task level CPU load, interrupt statistics, and the software variable transition chart. It also provides useful information on the system behavior. This virtual system analysis does not require software modification, does not change the control timing, and does not require any processing power from the target microcontroller. Therefore this method is suitable for real-time embedded control system design, in particular automotive control system design that requires a high level of reliability, robustness, quality, and safety. In this study, a Renesas SH-2A microcontroller model was developed on a CoMETTMplatform from VaST Systems Technology. An electronic throttle control (ETC) system and an engine control system were chosen to prove this concept. The electronic throttle body (ETB) model on the Saber® simulator from Synopsys® and the engine model on MATLAB®/Simulink® simulator from MathWorks can be simulated with the SH-2A model using a newly developed co-simulation interface between MATLAB®/Simulink® and CoMETTM. Though the SH-2A chip was being developed as the project was being executed, we were able to complete the OSEK OS development, control software design, and verification of the entire system using the virtual environment. After releasing a working sample chip in a later stage of the project, we found that such software could run on both actual ETC system and engine control system without critical problem. This demonstrates that our models and simulation environment are sufficiently credible and trustworthy.
Yasuhiro TAKAHASHI Toshikazu SEKINE Michio YOKOYAMA
An adiabatic logic is a technique to design low power digital VLSI's. This paper describes the design and VLSI implementation of a multiplier using a two phase drive adiabatic dynamic CMOS logic (2PADCL) circuit. Circuit operation and performance have been evaluated using a 4
Keiichiro KAGAWA Makoto SHOUHO Kazuo HASHIGUCHI Masahiro NUNOSHITA Jun OHTA
We demonstrate low-voltage operation of a CMOS imager with an in-pixel large-gain comparator without degradation of the dynamic range by using a pulse-width-modulation scheme in pixel readout. Experimental results showed a dynamic range of 57 dB with a 1.0 V power supply voltage at the pixel array block, which demonstrates the possibility of low-voltage, single-power-supply operation of imagers fabricated with deep-submicron CMOS technologies.
Koji KIKUSHIMA Toshihito FUJIWARA
This paper clarifies the sideband suppression ratio (SSR) value needed for multi-carrier signal modulation in an optical single sideband (SSB) modulator. An SSR value of about 25 dB is found to be sufficient for broadcast satellite (BS) multi-carrier signal modulation. For FM converted CATV signal modulation, an SSR value of about 10 dB is sufficient. In addition, the properties of cascaded lithium niobate Mach-Zehndar (LN MZ) optical SSB modulators are clarified to be better than those of the conventional single LN MZ optical SSB modulator with nearly the same SSR value of 27 dB.
Hitoshi HAYASHI Tadao NAKAGAWA Kazuhiro UEHARA Yoshihiro TAKIGAWA
This paper describes a miniaturized in-phase power divider with a DC block function. We first propose three types of miniaturized in-phase power dividers composed of two distributed transmission lines, a resistor, and three capacitors to function as a DC block. Then, we use a simulation to compare the dividers. The simulation results show that, by properly selecting circuit configuration, we both achieve broadband frequency characteristics and miniaturize circuitry as compared to the conventional Wilkinson power divider with two DC block capacitors. Finally, an experimental UHF power divider fabricated to test the design concept is presented. Over the frequency range from 0.44 to 0.66 GHz, the experimental power divider exhibits power splits of -3.2
Toshihisa KAMEI Yozo UTSUMI Nguyen Quoc DINH Nguyen THANH
Targeting the transition from a coaxial wave guide to a coplanar wave guide (CPW), a microwave and millimeter-wave wide-band coaxial-to-coplanar transition is proposed. This design connects the coaxial inner conductor to the CPW center conductor in a perpendicular manner to directly couple, on the same plane, the radial high-frequency electric field in the coax to the gap between the CPW's center conductor and ground plane. The performance of the proposed transition was compared with those of the conventional transition by experiments and electromagnetic field simulations, and it was found that the proposed method is independent of CPW shape and that it exhibits good matching performance in comparison to the conventional method especially in high-frequency bands above 15 GHz.
Young-Ju KIM Young-Jae CHO Doo-Hwan SA Seung-Hoon LEE
This work proposes a 10 b 200 MS/s 1.8 mm2 83 mW 0.13 µm CMOS ADC based on highly linear integrated capacitors for high-quality video system applications such as next-generation DTV and radar vision and wireless communication system applications such as WLAN, WiMax, SDR, LMDS, and MMDS simultaneously requiring low voltage, low power, and small area at high speed. The proposed 3-stage pipeline ADC optimizes chip area and power dissipation at the target resolution and sampling rate. The proposed ADC employs two versions of the SHA with gate-bootstrapped NMOS switches and conventional CMOS switches to verify and compare the input sampling effectiveness. Both of the two versions of the wide-band low-noise SHA maintain 10 b input accuracy at 200 MS/s. The proposed all signal-isolated 3-D completely symmetric capacitor layout reduces the device mismatch of two MDACs by isolating each unit capacitor from all neighboring signal lines with all the employed metal lines and by placing extra internal metal lines with a fixed internal bias voltage between signal lines connecting the bottom plate of each unit capacitor. The low-noise on-chip current and voltage references with internal RC filters can select optional off-chip voltage references. The prototype ADC is implemented in a 0.13 µm 1P8M CMOS process. The measured DNL and INL are within 0.24 LSB and 0.35 LSB while the ADC shows a maximum SNDR of 54 dB and 48 dB and a maximum SFDR of 67 dB and 61 dB at 200 MS/s and 250 MS/s, respectively. The ADC with an active die area of 1.8 mm2 consumes 83 mW at 200 MS/s and at a 1.2 V supply.
Luis H.C. FERREIRA Tales C. PIMENTA Robson L. MORENO
This paper describes a CMOS voltage reference that makes use of weak inversion CMOS transistors and linear resistors, without the need for bipolar transistors. Its operation is analogous to the bandgap reference voltage, but the reference voltage is based on the threshold voltage of an nMOS transistor. The circuit implemented using 0.35 µm n-well CMOS TSMC process generates a reference of 741 mV under just 390 nW for a power supply of only 950 mV. The circuit presented a variation of 39 ppm/
Yusuke KOBAYASHI C. Raghunathan MANOJ Kazuo TSUTSUI Venkanarayan HARIHARAN Kuniyuki KAKUSHIMA V. Ramgopal RAO Parhat AHMET Hiroshi IWAI
In this paper, we have systematically investigated parasitic effects due to the gate and source-drain engineering in multi-gate transistors. The potential impact of high-K dielectrics on multi-gate MOSFETs (MuGFETs), such as FinFET, is evaluated through 2D and 3D device simulations over a wide range of proposed dielectric values. It is observed that introduction of high-K dielectrics will significantly degrade the short channel effects (SCEs), however a combination of oxide and high-K stack can effectively control this degradation. The degradation is mainly due to the increase in the internal fringe capacitance coupled with the decrease in gate-channel capacitance. From the circuit perspective, an optimum K value has been identified through mixed mode simulations. Further, as a part of this work, the importance of optimization of the shape of the spacer region is highlighted through full 3D simulations.
Emad HAMIDI Mahmoud MOHAMMAD-TAHERI
A comparison is made between the performance of the MMIC matrix and distributed amplifiers. It has been shown that based on the analytical formulations, in most typical cases a cascaded dual stage distributed amplifier has more gain than that of a two-tier matrix amplifier with the same number of transistors; however the difference is not significant. Results of the analytical approach are then compared with the simulated and the measured results and a good agreement between the results has been obtained. Then other scattering parameters of the matrix and distributed amplifiers have been compared.
Emad HAMIDI Mahmoud MOHAMMAD-TAHERI
A new method is presented in order to improve the transient response of distributed amplifiers. The method is based on fitting the parameters of the distributed amplifier to those of a predesigned lowpass filter. Analytical expressions are derived to show the performance of the new structure. Three distributed amplifiers are designed based on the proposed method and it has been shown that the new method can significantly improve the transient response of the amplifier. It has been shown that the new method can improve the other characteristics of the distributed amplifier too. The effects of parasitic and lossy elements has also been considered and it has been shown that such effects doesn't violate the generality of the proposed theory.