Zhenhai TAN Yun YANG Xiaoman WANG Fayez ALQAHTANI
Chenrui CHANG Tongwei LU Feng YAO
Takuma TSUCHIDA Rikuho MIYATA Hironori WASHIZAKI Kensuke SUMOTO Nobukazu YOSHIOKA Yoshiaki FUKAZAWA
Shoichi HIROSE Kazuhiko MINEMATSU
Toshimitsu USHIO
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Qingping YU Yuan SUN You ZHANG Longye WANG Xingwang LI
Qiuyu XU Kanghui ZHAO Tao LU Zhongyuan WANG Ruimin HU
Lei Zhang Xi-Lin Guo Guang Han Di-Hui Zeng
Meng HUANG Honglei WEI
Yang LIU Jialong WEI Shujian ZHAO Wenhua XIE Niankuan CHEN Jie LI Xin CHEN Kaixuan YANG Yongwei LI Zhen ZHAO
Ngoc-Son DUONG Lan-Nhi VU THI Sinh-Cong LAM Phuong-Dung CHU THI Thai-Mai DINH THI
Lan XIE Qiang WANG Yongqiang JI Yu GU Gaozheng XU Zheng ZHU Yuxing WANG Yuwei LI
Jihui LIU Hui ZHANG Wei SU Rong LUO
Shota NAKAYAMA Koichi KOBAYASHI Yuh YAMASHITA
Wataru NAKAMURA Kenta TAKAHASHI
Chunfeng FU Renjie JIN Longjiang QU Zijian ZHOU
Masaki KOBAYASHI
Shinichi NISHIZAWA Masahiro MATSUDA Shinji KIMURA
Keisuke FUKADA Tatsuhiko SHIRAI Nozomu TOGAWA
Yuta NAGAHAMA Tetsuya MANABE
Baoxian Wang Ze Gao Hongbin Xu Shoupeng Qin Zhao Tan Xuchao Shi
Maki TSUKAHARA Yusaku HARADA Haruka HIRATA Daiki MIYAHARA Yang LI Yuko HARA-AZUMI Kazuo SAKIYAMA
Guijie LIN Jianxiao XIE Zejun ZHANG
Hiroki FURUE Yasuhiko IKEMATSU
Longye WANG Lingguo KONG Xiaoli ZENG Qingping YU
Ayaka FUJITA Mashiho MUKAIDA Tadahiro AZETSU Noriaki SUETAKE
Xingan SHA Masao YANAGISAWA Youhua SHI
Jiqian XU Lijin FANG Qiankun ZHAO Yingcai WAN Yue GAO Huaizhen WANG
Sei TAKANO Mitsuji MUNEYASU Soh YOSHIDA Akira ASANO Nanae DEWAKE Nobuo YOSHINARI Keiichi UCHIDA
Kohei DOI Takeshi SUGAWARA
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Mingjie LIU Chunyang WANG Jian GONG Ming TAN Changlin ZHOU
Hironori UCHIKAWA Manabu HAGIWARA
Atsuko MIYAJI Tatsuhiro YAMATSUKI Tomoka TAKAHASHI Ping-Lun WANG Tomoaki MIMOTO
Kazuya TANIGUCHI Satoshi TAYU Atsushi TAKAHASHI Mathieu MOLONGO Makoto MINAMI Katsuya NISHIOKA
Masayuki SHIMODA Atsushi TAKAHASHI
Yuya Ichikawa Naoko Misawa Chihiro Matsui Ken Takeuchi
Katsutoshi OTSUKA Kazuhito ITO
Rei UEDA Tsunato NAKAI Kota YOSHIDA Takeshi FUJINO
Motonari OHTSUKA Takahiro ISHIMARU Yuta TSUKIE Shingo KUKITA Kohtaro WATANABE
Iori KODAMA Tetsuya KOJIMA
Yusuke MATSUOKA
Yosuke SUGIURA Ryota NOGUCHI Tetsuya SHIMAMURA
Tadashi WADAYAMA Ayano NAKAI-KASAI
Li Cheng Huaixing Wang
Beining ZHANG Xile ZHANG Qin WANG Guan GUI Lin SHAN
Sicheng LIU Kaiyu WANG Haichuan YANG Tao ZHENG Zhenyu LEI Meng JIA Shangce GAO
Kun ZHOU Zejun ZHANG Xu TANG Wen XU Jianxiao XIE Changbing TANG
Soh YOSHIDA Nozomi YATOH Mitsuji MUNEYASU
Ryo YOSHIDA Soh YOSHIDA Mitsuji MUNEYASU
Nichika YUGE Hiroyuki ISHIHARA Morikazu NAKAMURA Takayuki NAKACHI
Ling ZHU Takayuki NAKACHI Bai ZHANG Yitu WANG
Toshiyuki MIYAMOTO Hiroki AKAMATSU
Yanchao LIU Xina CHENG Takeshi IKENAGA
Kengo HASHIMOTO Ken-ichi IWATA
Shota TOYOOKA Yoshinobu KAJIKAWA
Kyohei SUDO Keisuke HARA Masayuki TEZUKA Yusuke YOSHIDA
Hiroshi FUJISAKI
Tota SUKO Manabu KOBAYASHI
Akira KAMATSUKA Koki KAZAMA Takahiro YOSHIDA
Tingyuan NIE Jingjing NIE Kun ZHAO
Xinyu TIAN Hongyu HAN Limengnan ZHOU Hanzhou WU
Shibo DONG Haotian LI Yifei YANG Jiatianyi YU Zhenyu LEI Shangce GAO
Kengo NAKATA Daisuke MIYASHITA Jun DEGUCHI Ryuichi FUJIMOTO
Jie REN Minglin LIU Lisheng LI Shuai LI Mu FANG Wenbin LIU Yang LIU Haidong YU Shidong ZHANG
Ken NAKAMURA Takayuki NOZAKI
Yun LIANG Degui YAO Yang GAO Kaihua JIANG
Guanqun SHEN Kaikai CHI Osama ALFARRAJ Amr TOLBA
Zewei HE Zixuan CHEN Guizhong FU Yangming ZHENG Zhe-Ming LU
Bowen ZHANG Chang ZHANG Di YAO Xin ZHANG
Zhihao LI Ruihu LI Chaofeng GUAN Liangdong LU Hao SONG Qiang FU
Kenji UEHARA Kunihiko HIRAISHI
David CLARINO Shohei KURODA Shigeru YAMASHITA
Qi QI Zi TENG Hongmei HUO Ming XU Bing BAI
Ling Wang Zhongqiang Luo
Zongxiang YI Qiuxia XU
Donghoon CHANG Deukjo HONG Jinkeon KANG
Xiaowu LI Wei CUI Runxin LI Lianyin JIA Jinguo YOU
Zhang HUAGUO Xu WENJIE Li LIANGLIANG Liao HONGSHU
Seonkyu KIM Myoungsu SHIN Hanbeom SHIN Insung KIM Sunyeop KIM Donggeun KWON Deukjo HONG Jaechul SUNG Seokhie HONG
Manabu HAGIWARA
This paper proposes a novel cache architecture for low power consumption, called "Adaptive Way-Predicting Cache (AWP cache)." The AWP cache has multi-operation modes and dynamically adapts the operation mode based on the accuracy of way-prediction results. A confidence counter for way prediction is implemented to each cache set. In order to analyze the effectiveness of the AWP cache, we perform a SRAM design using 0.18 µm CMOS technology and cycle-accurate processor simulations. As the results, for a benchmark program (179.art), it is observed that a performance-aware AWP cache reduces the 49% of performance overhead caused by an original way-predicting cache to 17%. Furthermore, a energy-aware AWP cache achieves 73% of energy reduction, whereas that obtained from the original way-predicting scheme is only 38%, compared to an non-optimized conventional cache. For the consideration of energy-performance efficiency, we see that the energy-aware AWP cache produces better results; the energy-delay product of conventional organization is reduced to only 35% in average which is 6% better than the original way-predicting scheme.
Satoshi KOMATSU Masahiro FUJITA
Energy consumption is one of the most critical constraints in the current VLSI system designs. In addition, fault tolerance of VLSI systems will be also one of the most important requirements in the future shrunk VLSIs. This paper proposes practical low power and fault tolerant bus encoding methods in on-chip data transfer. The proposed encoding methods use the combination of simple low power code and fault tolerant code. Experimental results show that the proposed methods can reduce signal transitions by 23% on the bus with fault tolerance. In addition, circuit implementation results with bus signal swing optimization show the effectiveness of the proposed encoding methods. We show also the selection methodology of the optimum encoding method under the given requirements.
Kentaro KAWAKAMI Miwako KANAMORI Yasuhiro MORITA Jun TAKEMURA Masayuki MIYAMA Masahiko YOSHIMOTO
To achieve both of a high peak performance and low average power characteristics, frequency-voltage cooperative control processor has been proposed. The processor schedules its operating frequency according to the required computation power. Its operating voltage or body bias voltage is adequately modulated simultaneously to effectively cut down either switching current or leakage current, and it results in reduction of total power dissipation of the processor. Since a frequency-voltage cooperative control processor has two or more operating frequencies, there are countless scheduling methods exist to realize a certain number of cycles by deadline time. This proposition is frequently appears in a hard real-time system. This paper proves two important theorems, which give the power-minimum frequency scheduling method for any types of frequency-voltage cooperative control processor, such as Vdd-control type, Vth-control type and Vdd-Vth-control type processors.
Weisheng CHONG Masanori HARIYAMA Michitaka KAMEYAMA
A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in field-programmable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages.
A method for fast but yet accurate performance evaluation of processor architecture is mostly desirable in modern processors design. This paper proposes one such method which can measure cycle counts and power consumption of pipelined processors. The method first develops a trace-driven performance simulation model and then employs simulation reuse in simulation of the model. The trace-driven performance modeling is for accuracy in which performance simulation uses the same execution traces as constructed in simulation for functional verification. Fast performance simulation can be achieved in a way that performance for each instruction in the traces is evaluated without evaluation of the instruction itself. Simulation reuse supports simulation speedup by elimination of an evaluation at the current state, which is identical to that at a previous state. The reuse approach is based on the property that application programs, especially multimedia applications, have many iterative loops in general. A performance simulator for pipeline architecture based on the proposed method has been developed through which greater speedup has been made compared with other approaches in performance evaluation.
Takeshi MATSUMOTO Hiroshi SAITO Masahiro FUJITA
In this paper, an efficient equivalence checking method for two C descriptions is described. The equivalence of two C descriptions is proved by symbolic simulation. Symbolic simulation used in this paper can prove the equivalence of all of the variables in the descriptions. However, it takes long time to verify the equivalence of all of the variables if large descriptions are given. Therefore, in order to improve the verification, our method identifies textual differences between descriptions. The identified textual differences are used to reduce the number of equivalence checkings among variables. The proposed method has been implemented in C language and evaluated with several C descriptions.
Masao MORIMOTO Yoshinori TANAKA Makoto NAGATA Kazuo TAKI
This paper proposes a logic synthesis technique for asymmetric slope differential dynamic logic (ASDDL) circuits. The technique utilizes a commercially available logic synthesis tool that has been well established for static CMOS logic design, where an intermediate library is devised for logic synthesis likely as static CMOS, and then a resulting synthesized circuit is translated automatically into ASDDL implementation at the gate-level logic schematic level as well as at the physical-layout level. A design example of an ASDDL 16-bit multiplier synthesized in a 0.18-µm CMOS technology shows an operation delay time of 1.82 nsec, which is a 32% improvement over a static CMOS design with a static logic standard-cell library that is finely tuned for energy-delay products. Design with the 16-bit multiplier led to a design time for an ASDDL based dynamic digital circuit 300 times shorter than that using a fully handcrafted design, and comparable with a static CMOS design.
Debatosh DEBNATH Tsutomu SASAO
Fixed polarity Reed-Muller expressions (FPRMs) exhibit several useful properties that make them suitable for many practical applications. This paper presents an exact minimization algorithm for FPRMs for incompletely specified functions. For an n-variable function with α unspecified minterms there are 2n+α distinct FPRMs, and a minimum FPRM is one with the fewest product terms. To find a minimum FPRM the algorithm requires to determine an assignment of the incompletely specified minterms. This is accomplished by using the concept of integer-valued functions in conjunction with an extended truth vector and a weight vector. The vectors help formulate the problem as an assignment of the variables of integer-valued functions, which are then efficiently manipulated by using multi-terminal binary decision diagrams for finding an assignment of the unspecified minterms. The effectiveness of the algorithm is demonstrated through experimental results for code converters, adders, and randomly generated functions.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA
This paper shows a design method for a sequential circuit by using a Look-Up Table (LUT) ring. The method consists of two steps: The first step partitions the outputs into groups. The second step realizes them by LUT cascades, and allocates the cells of the cascades into the memory. The system automatically finds a fast implementation by maximally utilizing available memory. With the presented algorithm, we can easily design sequential circuits satisfying given specifications. The paper also compares the LUT ring with logic simulator to realize sequential circuits: the LUT ring is 25 to 237 times faster than a logic simulator that uses the same amount of memory.
Yuichi NAKAMURA Ko YOSHIKAWA Takeshi YOSHIMURA
This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.
Hidenari NAKASHIMA Junpei INOUE Kenichi OKADA Kazuya MASU
Interconnect Length Distribution (ILD) represents the correlation between the number of interconnects and their length. The ILD can predict power consumption, clock frequency, chip size, etc. High core utilization and small circuit area have been reported to improve chip performance. We propose an ILD model to predict the correlation between core utilization and chip performance. The proposed model predicts the influences of interconnect length and interconnect density on circuit performances. As core utilization increases, small and simple circuits improve the performances. In large complex circuits, decreasing the wire coupling capacitance is more important than decreasing the total interconnect length for improvement of chip performance. The proposed ILD model expresses the actual ILD more accurately than conventional models.
Zhangcai HUANG Atsushi KUROKAWA Jun PAN Yasuaki INOUE
In deep submicron designs, predicting gate slews and delays for interconnect loads is vitally important for Static Timing Analysis (STA). The effective capacitance Ceff concept is usually used to calculate the gate delay of interconnect loads. Many Ceff algorithms have been proposed to compute gate delay of interconnect loads. However, less work has been done to develop a Ceff algorithm which can accurately predict gate slew. In this paper, we propose a novel method for calculating the Ceff of interconnect load for gate slew. We firstly establish a new expression for Ceff in 0.8Vdd point. Then the Integration Approximation method is used to calculate the value of Ceff in 0.8Vdd point. In this method, the integration of a complicated nonlinear gate output is approximated with that of a piecewise linear waveform. Based on the value of Ceff in 0.8Vdd point, Ceff of interconnect load for gate slew is obtained. The simulation results demonstrate a significant improvement in accuracy.
Masanori HASHIMOTO Tomonori YAMAMOTO Hidetoshi ONODERA
This paper discusses clock skew due to manufacturing variability and environmental change. In clock tree design, transition time constraint is an important design parameter that controls clock skew and power dissipation. In this paper, we evaluate clock skew under several variability models, and demonstrate relationship among clock skew, transition time constraint and power dissipation. Experimental results show that constraint of small transition time reduces clock skew under manufacturing and supply voltage variabilities, whereas there is an optimum constraint value for temperature gradient. Our experiments in a 0.18 µm technology indicate that clock skew is minimized when clock buffer is sized such that the ratio of output and input capacitance is four.
Takashi SATO Junji ICHIMIYA Nobuto ONO Koutaro HACHIYA Masanori HASHIMOTO
This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic block, power density, and floorplan on thermal gradient are studied quantitatively. Temperature difference is also evaluated from timing and reliability standpoints. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.
Hiroaki ITOGA Chikaaki KODAMA Kunihiro FUJIYOSHI
In the VLSI layout design, a floorplan is often obtained to define rough arrangement of modules in the early design stage. In the stage, the aspect ratio of each soft module is also determined. The aspect ratio can be changed in the designated range keeping its area of each module. In this paper, in order to determine the aspect ratio, we propose a graph-based one dimensional compaction method which determines the aspect ratio quickly under the constraint that topology of a floorplan must not be changed. The proposed method is divided into two steps: (1) Selection of a minimal set of soft modules to adjust the aspect ratio. (2) Decision on the aspect ratio. (1) is formulated as the minimal cut problem in graph theory. We solve the problem by transforming it to the shortest path problem. (2) is divided into two operations. One is to determine the increment limit in height or width of each soft module and the other is to determine the aspect ratio of each soft module by Newton-Raphson method. The experimental comparisons show effectiveness of the proposed method.
Jing LI Juebang YU Hiroshi MIYASHITA
Incremental modification and optimization in VLSI physical design is of fundamental importance. Based on the O-tree (ordered tree) representation which has more prominent advantages in comparison with other topological representations of non-slicing floorplans, in this paper, we present an incremental placement algorithm for BBL (Building Block Layout) design in VLSI physical design. The good performance of experimental results in dealing with some instances proves the effectiveness of our algorithm.
Yongqiang LU Chin-Ngai SZE Xianlong HONG Qiang ZHOU Yici CAI Liang HUANG Jiang HU
With VLSI design development, the increasingly severe power problem requests to minimize clock routing wirelength so that both power consumption and power supply noise can be alleviated. In contrast to most of traditional works that handle this problem only in clock routing, we propose to navigate standard cell register placement to locations that enable further less clock routing wirelength and power. To minimize adverse impacts to conventional cell placement goals such as signal net wirelength and critical path delay, the register placement is carried out in the context of a quadratic placement. The proposed technique is particularly effective for the recently popular prescribed skew clock routing. Experiments on benchmark circuits show encouraging results.
Yun YANG Atsushi KUROKAWA Yasuaki INOUE Wenqing ZHAO
In this paper we propose a novel and efficient method for the optimization of the power/ground (P/G) network in VLSI circuit layouts with reliability constraints. Previous algorithms in the P/G network sizing used the sequence-of-linear-programming (SLP) algorithm to solve the nonlinear optimization problems. However the transformation from nonlinear network to linear subnetwork is not optimal enough. Our new method is inspired by the biological evolution and use the grid-genetic-algorithm (GGA) to solve the optimization problem. Experimental results show that new P/G network sizes are smaller than previous algorithms, as the fittest survival in the nature. Another significant advance is that GGA method can be applied for all P/G network problems because it can get the results directly no matter whether these problems are linear or not. Thus GGA can be adopted in the transient behavior of the P/G network sizing in the future, which recently faces on the obstacles in the solution of the complex nonlinear problems.
Hiroyuki TSUJIKAWA Kenji SHIMAZAKI Shozo HIRANO Kazuhiro SATO Masanori HIROFUJI Junichi SHIMADA Mitsumi ITO Kiyohito MUKAI
In the move toward higher clock rates and advanced process technologies, designers of the latest electronic products are finding increasing silicon failure with respect to noise. On the other hand, the minimum dimension of patterns on LSIs is much smaller than the wavelength of exposure, making it difficult for LSI manufacturers to obtain high yield. In this paper, we present a solution to reduce power-supply noise in LSI microchips. The proposed design methodology also considers design for manufacturability (DFM) at the same time as power integrity. The method was successfully applied to the design of a system-on-chip (SOC), achieving a 13.1-13.2% noise reduction in power-supply voltage and uniformity of pattern density for chemical mechanical polishing (CMP).
Takashi SATO Masanori HASHIMOTO Hidetoshi ONODERA
An efficient pad assignment methodology to minimize voltage drop on a power distribution network is proposed. A combination of successive pad assignment (SPA) with incremental matrix inversion (IMI) determines both location and number of power supply pads to satisfy drop voltage constraint. The SPA creates an equivalent resistance matrix which preserves both pad candidates and power consumption points as external ports so that topological modification due to connection or disconnection between voltage sources and candidate pads is consistently represented. By reusing sub-matrices of the equivalent matrix, the SPA greedily searches the next pad location that minimizes the worst drop voltage. Each time a candidate pad is added, the IMI reduces computational complexity significantly. Experimental results including a 400 pad problem show that the proposed procedures efficiently enumerate pad order in a practical time.
Hidenari NAKASHIMA Naohiro TAKAGI Junpei INOUE Kenichi OKADA Kazuya MASU
In this paper, we propose a new Interconnect Length Distribution (ILD) model to evaluate X architecture. X architecture uses 45
Takanori KYOGOKU Junpei INOUE Hidenari NAKASHIMA Takumi UEZONO Kenichi OKADA Kazuya MASU
This paper concerns a new model for estimating the wire length distribution (WLD) of a system-on-a-chip (SoC). The WLD represents the correlation between wire length and the number of interconnects, and we can predict circuit performances such as power consumption, maximum clock frequency, and chip size from the WLD. A WLD model considering core utilization has been proposed, and the core utilization has a large impact on circuit performance. However, the WLD model can treat only a one-function circuit. We propose a new WLD model considering core utilization to estimate the wire length distribution of SoC, which consists of several different-function macroblocks. We present an optimization method to determine each core utilization of macroblocks.
Atsushi KUROKAWA Masanori HASHIMOTO Akira KASEBE Zhangcai HUANG Yun YANG Yasuaki INOUE Ryosuke INAGAKI Hiroo MASUDA
Simple closed-form expressions for efficiently calculating on-chip interconnect capacitances are presented. The formulas are expressed with second-order polynomial functions which do not include exponential functions. The runtime of the proposed formulas is about 2-10 times faster than those of existing formulas. The root mean square (RMS) errors of the proposed formulas are within 1.5%, 1.3%, 3.1%, and 4.6% of the results obtained by a field solver for structures with one line above a ground plane, one line between ground planes, three lines above a ground plane, and three lines between ground planes, respectively. The proposed formulas are also superior in accuracy to existing formulas.
Toshiki KANAMOTO Tetsuya WATANABE Mitsutoshi SHIROTA Masayuki TERAI Tatsuya KUNIKIYO Kiyoshi ISHIKAWA Yoshihide AJIOKA Yasutaka HORIBA
This paper proposes a new non-destructive methodology to estimate physical parameters for LSIs. In order to resolve the estimation accuracy degradation issue for low-k dielectric films, we employ a parallel-plate capacitance measurement and a wire resistance measurement in our non-destructive method. Due to (1) the response surface functions corresponding to the parallel-plate capacitance measurement and the wire resistance measurement and (2) the searching of the physical parameter values using our cost function and simulated annealing, the proposed method attains higher precision than that of the existing method. We demonstrate the effectiveness of our method by application to our 90 nm SoC process using low-k materials.
Atsushi KUROKAWA Toshiki KANAMOTO Tetsuya IBE Akira KASEBE Wei Fong CHANG Tetsuro KAGE Yasuaki INOUE Hiroo MASUDA
Floating dummy metal fills inserted for planarization of multi-dielectric layers have created serious problems because of increased interconnect capacitance and the enormous number of fills. We present new dummy filling methods to reduce the interconnect capacitance and the number of dummy metal fills needed. These techniques include three ways of filling: 1) improved floating square fills, 2) floating parallel lines, and 3) floating perpendicular lines (with spacing between dummy metal fills above and below signal lines). We also present efficient formulas for estimating the appropriate spacing and number of fills. In our experiments, the capacitance increase using the conventional regular square method was 13.1%, while that using the methods of improved square fills, extended parallel lines, and perpendicular lines were 2.7%, 2.4%, and 1.0%, respectively. Moreover, the number of necessary dummy metal fills can be reduced by two orders of magnitude through use of the parallel line method.
Esteban TLELO-CUAUTLE Delia TORRES-MUÑOZ Leticia TORRES-PAPAQUI
A systematic method is introduced to the computational synthesis of CMOS voltage followers (VFs). The method is divided in three steps: generation of the small-signal circuitry by selection of nullators to model the behavior of a VF, and addition of norators to form nullator-norator joined-pairs; generation of the bias circuitry by addition of ideal biases according to the properties of nullators and norators; and synthesis of the joined-pairs by MOSFETs, and of the current-biases by CMOS current mirrors. It is shown that the proposed synthesis method has the capability to generate already known and new CMOS VF topologies.
Tetsuya IIZUKA Makoto IKEDA Kunihiro ASADA
This paper proposes flat and hierarchical approaches for generating a minimum-width transistor placement of CMOS cells in presence of non-dual P and N type transistors. Our approaches are the first exact method which can be applied to CMOS cells with any types of structure. Non-dual CMOS cells occupy a major part of an industrial standard-cell library. To generate the exact minimum-width transistor placement of non-dual CMOS cells, we formulate the transistor placement problem into Boolean Satisfiability (SAT) problem considering the P and N type transistors individually. Using the proposed method, the transistor placement problem of any types of CMOS cells can be solved exactly. In addition, the experimental results show that our flat approach generates smaller width placement for 29 out of 103 dual cells than that of the conventional method. Our hierarchical approach reduces the runtimes drastically. Although this approach has possibility to generate wider placements than that of the flat approach, the experimental results show that the width of only 3 out of 147 cells solved by our hierarchical approach are larger than that of the flat approach.
Yuichiro MURACHI Koji HAMANO Tetsuro MATSUNO Junichi MIYAKOSHI Masayuki MIYAMA Masahiko YOSHIMOTO
This paper describes a 95 mW MPEG2 MP@HL motion estimation processor core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920
Chi-Chia SUNG Shanq-Jang RUAN Bo-Yao LIN Mon-Chau SHIE
In recent years, the demand for multimedia mobile battery-operated devices has created a need for low power implementation of video compression. Many compression standards require the discrete cosine transform (DCT) function to perform image/video compression. For this reason, low power DCT design has become more and more important in today's image/video processing. This paper presents a new power-efficient Hybrid DCT architecture which combines Loeffler DCT and binDCT in terms of special property on luminance and chrominance difference. We use Synopsys PrimePower to estimate the power consumption in a TSMC 0.25-µm technology. Besides, we also adopt a novel quality assessment method based on structural distortion measurement to measure the quality instead of peak signal to noise rations (PSNR) and mean squared error (MSE). It is concluded that our Hybrid DCT offers similar quality performance to the Loeffler, and leads to 25% power consumption and 27% chip area savings.
This paper presents an efficient VLSI architecture of biorthogonal (9,7)/(5,3) lifting based discrete wavelet transform that is used by lossy or lossless compression of JPEG2000. To improve hardware utilization of RPA (Recursive Pyramid Algorithm) implementation, we make the filter that is responsible for row operations of the first level perform both column operations and row operations of the second and following levels. As a result, the architecture has 66.7-88.9% hardware utilization. It requires 9 multipliers, 12 adders, and 12N line memories for N
Masanori HARIYAMA Yasuhiro KOBAYASHI Haruka SASAKI Michitaka KAMEYAMA
This paper presents a processor architecture for high-speed and reliable stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on the regularity of reference pixels. The stereo matching processor is implemented on an FPGA. Its performance is 80 times higher than that of a microprocessor (Pentium4@2 GHz), and is enough to generate a 3-D depth image at the video rate of 33 MHz.
Zhenyu LIU Yang SONG Takeshi IKENAGA Satoshi GOTO
Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s
Yuan-Long JEANG Jer-Min JOU Win-Hsien HUANG
In this paper, a methodology based on a mix-mode interconnection architecture is proposed for constructing an application specific network on chip to minimize the total communication time. The proposed architecture uses a globally asynchronous communication network and a locally synchronous bus (or cross-bar or multistage interconnection network MIN). First, a local bus is given for a group of IP cores so that the communications within this local bus can be arranged to be exclusive in time. If the communications of some IP cores should be required to be completed within a given amount of time, then a non-blocking MIN or a crossbar switch should be made for those IP cores instead of a bus. Then, a communication ratio (CR) for each pair of local buses is provided by users, and based on the Huffman coding philosophy, a process is applied to construct a binary tree (BT) with switches on the internal nodes and buses on the leaves. Since the binary tree system is deadlock free (no cycle exists in any path), the router is just a relatively simple and cheap switch. Simulation results show that the proposed methodology and architecture of NOC is better on switching circuit cost and performance than the SPIN and the mesh architecture using our developed deadlock-free router.
Luca FANUCCI Massimo ROVINI Nicola E. L'INSALATA Francesco ROSSI
As an enhancement of the state-of-the-art solutions, a high-throughput architecture of a decoder for structured LDPC codes is presented in this paper. Thanks to the peculiar code definition and to the envisaged architecture featuring memory paging, the decoder is very flexible, and the support of different code rates is achieved with no significant hardware overhead. A top-down design flow of a real decoder is reported, starting from the analysis of the system performance in finite-precision arithmetic, up to the VLSI implementation details of the elementary modules. The synthesis of the whole decoder on 0.18µm standard cells CMOS technology showed remarkable performances: small implementation loss (0.2dB down to BER = 10-8), low latency (less than 6.0µs), high useful throughput (up to 940Mbps) and low complexity (about 375 Kgates).
Vasily G. MOSHNYAGA Tomoyuki YAMANAKA
Design of portable battery operated multimedia devices requires energy-efficient multiplication circuits. This paper proposes a novel architectural technique to reduce power consumption of digital multipliers. Unlike related approaches which focus on multiplier transition activity reduction, we concentrate on dynamic reduction of supply voltage. Two implementation schemes capable of dynamically adjusting a double voltage supply to input data variation are presented. Simulations show that using these schemes we can reduce energy consumption of 16
This paper presents a standard cell-based frequency synthesizer with dynamic frequency counting (DFC) for multiplying input reference frequency by N times. The dynamic frequency counting loop uses variable time period to estimate and tune the frequency of digitally-controlled oscillator (DCO) which enhances frequency detection's resolution and loop stability. Two ripple counters serve as frequency estimator. Conventional phase-frequency detector (PFD) thus is replaced with a digital arithmetic comparator to yield a divider-free circuit structure. Additionally, a 15 bits DCO with the least significant bit (LSB) resolution 1.55 ps is designed by using the gate capacitance difference of 2-input NOR gate in fine-tuning stage. A modified incremental data weighted averaging (IDWA) circuit is also designed to achieve improved linearity of DCO by dynamic element matching (DEM) skill. Based on the proposed standard cell-based frequency synthesizer, a test chip is designed and verified on 0.35-µm complementary metal oxide silicon (CMOS) process, and has a frequency range of (18-214) MHz at 3.3 V with peak-to-peak (Pk-Pk) jitter of less than 70 ps at 192 MHz/3.3 V.
Atsushi MURAMATSU Masanori HASHIMOTO Hidetoshi ONODERA
With increase of clock frequency, on-chip wire inductance starts to play an important role in power/ground distribution analysis, although it has not been considered so far. We perform a case study work that evaluates relation between decoupling capacitance position and noise suppression effect, and we reveal that placing decoupling capacitance close to current load is necessary for noise reduction. We experimentally show that impact of on-chip inductance becomes small when on-chip decoupling capacitance is well placed according to local power consumption. We also examine influences of grid pitch, wire area, and spacing between paired power and ground wires on power supply noise. When effect of on-chip inductance on power/ground noise is significant, minification of grid pitch is more efficient than increase in wire area, and small spacing reduces power noise as we expected.
Chia-Chi CHU Ming-Hong LAI Wu-Shiung FENG
An order selection scheme for two-sided oblique projection-based interconnect reduction will be investigated. It will provide a guideline for terminating the conventional nonsymmetric Pade via Lanczos (PVL) iteration process. By exploring the relationship of the system Grammians of the original network and those of the reduced network, it can be shown that the system matrix of the reduced-order system generated by the two-sided oblique projection can also be expressed as those of the original interconnect model with some additive perturbations. The perturbation matrix only involves bi-orthogonal vectors at the previous step of the nonsymmetric Lanczos algorithm. This perturbation matrix will provide the stopping criteria in the order selection scheme and achieve the desired accuracy of the approximate transfer function.
In this paper a straight-forward approach to extract the equivalent loop-gain of a continuous time Delta-Sigma modulator with an arbitrary DAC waveform in z-domain is presented. In this approach the arbitrary DAC waveform is approximated by the infinite number of rectangular pulse shapes. Then simply using the transformations available in literatures for a rectangular DAC pulse shape and applying superposition on each rectangular pulse shape, the loop-gain of the system is derived in z-domain.
Chun-Lung HSU Wen-Tso WANG Ying-Fu HONG
This work presents a frequency-scaling low-power (FSLP) design methodology for managing power consumption of cores in the tile-based network-on-chip (NOC) architecture. A moving picture experts group (MPEG) core is tested using the field-programmable gate array (FPGA) implementation to verify the feasibility of the proposed method. Measurement results show that about 30% power consumption can be saved in the MPEG core and reveal that the proposed FSLP design method can be suitable for cores in the tile-based NOC applications.
Marie NAKAZAWA Atsuhiro NISHIKATA
In this study, we propose a new acoustic model including the human ear canal and a thin tube earphone. The use of a tube earphone enables simultaneous listening of both virtual and real surrounding sound. First, we perform acoustic FDTD (finite difference time domain) simulations using an MRI head model with ear canals. The calculated external impedance viewed from the eardrum numerically shows that the influence of the inserted tube is small. A listening experiment with six subjects also confirms the effectiveness of a tube earphone. Second, we calculate HRTFs (head-related transfer functions) for eight directions in the horizontal plane to realize sound localization with a tube earphone. We also design inverse filters based on the propagation calculations including the characteristics of tube earphones. Finally we evaluate the localization system by another listening experiment with six subjects. The results reveal that the applicability of a system with tube earphones and inverse filters, particularly for the front directions.
Shoko ARAKI Shoji MAKINO Robert AICHNER Tsuyoki NISHIKAWA Hiroshi SARUWATARI
We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.
We consider the chains of integrators with nonlinear terms which allow state and input delays. We provide an output feedback controller which globally asymptotically stabilizes the given system under certain sufficient conditions. It turns out that the obtained result includes several existing results as particular cases. This point is shown through two applications of the main result. Also, industrial processes are presented to illustrate the practicability of our result.
Marcelo E. KAIHARA Naofumi TAKAGI
A hardware algorithm for modular multiplication/division which performs modular division, Montgomery multiplication, and ordinary modular multiplication is proposed. The modular division in our algorithm is based on the extended Euclidean algorithm. We employ our newly proposed computation method that consists of processing the multiplier from the most significant digit first to calculate Montgomery multiplication. Finally, the ordinary modular multiplication is based on shift-and-add multiplication. Each of these three operations is carried out through the iteration of simple operations such as shifts and additions/subtractions. To avoid carry propagation in all additions and subtractions, the radix-2 signed-digit representation is employed. A modular multiplier/divider based on the algorithm has a linear array structure with a bit-slice feature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implemented using a hardware amount only slightly larger than that of the modular divider.
HeeSoo KIM Shigeru YAMADA DongHo PARK
In this paper, we propose a new software reliability growth model which is the mixture of two exponential reliability growth models, one of which has the reliability growth and the other one does not have the reliability growth after the software is released upon completion of testing phase. The mixture of two such models is characterized by a weighted factor p, which is the proportion of reliability growth part within the model. Firstly, this paper discusses an optimal software release problem with regard to the expected total software cost incurred during the warranty period under the proposed software reliability growth model, which generalizes Kimura, Toyota and Yamada's (1999) model with consideration of the weighted factor. The second main purpose of this paper is to apply the Bayesian approach to the optimal software release policy by assuming the prior distributions for the unknown parameters contained in the proposed software reliability growth model. Some numerical examples are presented for the purpose of comparing the optimal software release policies depending on the choice of parameters by the non-Bayesian and Bayesian methods.
Hachiro FUJITA Kohichi SAKANIWA
Low-density parity-check (LDPC) codes are one of the most promising next-generation error-correcting codes. For practical use, efficient methods for encoding of LDPC codes are needed and have to be studied. However, it seems that no general encoding methods suitable for hardware implementation have been proposed so far and for randomly constructed LDPC codes there have been no other methods than the simple one using generator matrices. In this paper we show that some classes of quasi-cyclic LDPC codes based on circulant permutation matrices, specifically LDPC codes based on array codes and a special class of Sridhara-Fuja-Tanner codes and Fossorier codes can be encoded by division circuits as cyclic codes, which are very easy to implement. We also show some properties of these codes.
Daiyuan PENG Pingzhi FAN Naoki SUEHIRO
In order to eliminate the co-channel and multi-path interference of quasi-synchronous code division multiple access (QS-CDMA) systems, spreading sequences with low or zero correlation zone (LCZ or ZCZ) can be used. The significance of LCZ/ZCZ to QS-CDMA systems is that, even there are relative delays between the transmitted spreading sequences due to the inaccurate access synchronization and the multipath propagation, the orthogonality (or quasi-orthogonality) between the transmitted signals can still be maintained, as long as the relative delay does not exceed certain limit. In this paper, several lower bounds on the aperiodic autocorrelation and crosscorrelation of binary LCZ/ZCZ sequence set with respect to the family size, sequence length and the aperiodic low or zero correlation zone, are derived. The results show that the new bounds are tighter than previous bounds for the LCZ/ZCZ sequences.
Weixing BI Xugang WANG Zheng TANG Hiroki TAMURA
One critical "drawback" of the backpropagation algorithm is the local minima problem. We have noted that the local minima problem in the backpropagation algorithm is usually caused by update disharmony between weights connected to the hidden layer and the output layer. To solve this kind of local minima problem, we propose a modified error function with two terms. By adding one term to the conventional error function, the modified error function can harmonize the update of weights connected to the hidden layer and those connected to the output layer. Thus, it can avoid the local minima problem caused by such disharmony. Simulations on some benchmark problems and a real classification task have been performed to test the validity of the modified error function.
In this Letter, linear least squares (LLS) techniques for phase estimation of real sinusoidal signals with known or unknown amplitudes are studied. It is proved that the asymptotic performance of the LLS approach attains Cramér-Rao lower bound. For the case of a single tone, a novel LLS algorithm with unit-norm constraint is derived. Simulation results are also included for algorithm evaluation.
It has been observed in the literature that the characteristic polynomial of a discrete system can be computed from the characteristic impulse response Gramian. In this letter it is shown that a given characteristic impulse response Gramian, in fact, contains information on two characteristic polynomials. The importance of this result is illustrated through an application to model reduction of discrete systems.
Kohsuke OGATA Toshinori YAMADA Shuichi UENO
This note shows an efficient implementation of de Bruijn networks by the Optical Transpose Interconnection System (OTIS) extending previous results by Coudert, Ferreira, and Perennes [2].
Although a proposed steganographic encoding scheme can reduce distortion caused by data hiding, it makes the system susceptible to active-warden attacks due to error spreading. Meanwhile, straightforward application of error correction encoding inevitably increases the required amount of bit alterations so that the risk of being detected will increase. To overcome the drawback in both cases, an integrated approach is introduced that combines the stego-encoding and error correction encoding to provide enhanced robustness against active attacks and channel noise while keeping good imperceptibility.
Mitsuhiro HATTORI Shoichi HIROSE Susumu YOSHIDA
The security of SHA-0 with various message schedules is discussed in this letter. SHA-0 employs a primitive polynomial of degree 16 over GF(2) in its message schedule. For each primitive polynomial, a SHA-0 variant can be constructed. The collision resistance and the near-collision resistance of SHA-0 variants to the Chabaud-Joux attack are evaluated. Moreover, the near-collision resistance of a variant to the Biham-Chen attack is evaluated. It is shown that the selection of primitive polynomials highly affects the resistance. However, it is concluded that these SHA-0 variants are not appropriate for making SHA-0 secure.
The Reed-Solomon code is a versatile channel code pervasively used for communication and storage systems. The bit-serial Reed-Solomon encoder has a simple structure, although it is somewhat difficult to understand the algorithm without considerable theoretical background. Some professionals and students, not able to understand the algorithm thoroughly, might need to implement the bit-serial encoder for themselves. In this letter, a step-by-step method is presented for the implementation of the bit-serial encoder even without understanding the internal algorithm, which would be helpful for VHDL, DSP, and simulation programming.
In the letter, properties of m-sequence are derived, based on these properties, a family of binary nonlinear constant weight codes is presented, these binary nonlinear constant weight codes can apply to automatic repeat request (ARQ) communication system, as detecting-error codes.
Jaesang LIM Yongchul SONG Jeongpyo KIM Beomsup KIM
This letter describes an efficient architecture for a Software Defined Radio (SDR) Wideband Code Division Multiple Access (WCDMA) receiver using for high performance wireless communication systems. The architecture is composed of a Radio Frequency (RF) front-end, an Analog-to-Digital Converter (ADC), and a Quadrature Amplitude Modulation (QAM) demodulator. A coherent demodulator, with a complete digital synchronization scheme, achieves the bit-error rate (BER) of 10-6 with the implementation loss of 0.5 dB for a raw Quadrature Phase Shift King (QPSK) signal.