The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.48

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E88-A No.12  (Publication Date:2005/12/01)

    Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Shinji KIMURA  

     
    FOREWORD

      Page(s):
    3273-3273
  • Adaptive Mode Control for Low-Power Caches Based on Way-Prediction Accuracy

    Hidekazu TANAKA  Koji INOUE  

     
    PAPER-Low Power Methodology

      Page(s):
    3274-3281

    This paper proposes a novel cache architecture for low power consumption, called "Adaptive Way-Predicting Cache (AWP cache)." The AWP cache has multi-operation modes and dynamically adapts the operation mode based on the accuracy of way-prediction results. A confidence counter for way prediction is implemented to each cache set. In order to analyze the effectiveness of the AWP cache, we perform a SRAM design using 0.18 µm CMOS technology and cycle-accurate processor simulations. As the results, for a benchmark program (179.art), it is observed that a performance-aware AWP cache reduces the 49% of performance overhead caused by an original way-predicting cache to 17%. Furthermore, a energy-aware AWP cache achieves 73% of energy reduction, whereas that obtained from the original way-predicting scheme is only 38%, compared to an non-optimized conventional cache. For the consideration of energy-performance efficiency, we see that the energy-aware AWP cache produces better results; the energy-delay product of conventional organization is reduced to only 35% in average which is 6% better than the original way-predicting scheme.

  • Low Power and Fault Tolerant Encoding Methods for On-Chip Data Transfer in Practical Applications

    Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER-Low Power Methodology

      Page(s):
    3282-3289

    Energy consumption is one of the most critical constraints in the current VLSI system designs. In addition, fault tolerance of VLSI systems will be also one of the most important requirements in the future shrunk VLSIs. This paper proposes practical low power and fault tolerant bus encoding methods in on-chip data transfer. The proposed encoding methods use the combination of simple low power code and fault tolerant code. Experimental results show that the proposed methods can reduce signal transitions by 23% on the bus with fault tolerance. In addition, circuit implementation results with bus signal swing optimization show the effectiveness of the proposed encoding methods. We show also the selection methodology of the optimum encoding method under the given requirements.

  • Power-Minimum Frequency/Voltage Cooperative Management Method for VLSI Processor in Leakage-Dominant Technology Era

    Kentaro KAWAKAMI  Miwako KANAMORI  Yasuhiro MORITA  Jun TAKEMURA  Masayuki MIYAMA  Masahiko YOSHIMOTO  

     
    PAPER-Low Power Methodology

      Page(s):
    3290-3297

    To achieve both of a high peak performance and low average power characteristics, frequency-voltage cooperative control processor has been proposed. The processor schedules its operating frequency according to the required computation power. Its operating voltage or body bias voltage is adequately modulated simultaneously to effectively cut down either switching current or leakage current, and it results in reduction of total power dissipation of the processor. Since a frequency-voltage cooperative control processor has two or more operating frequencies, there are countless scheduling methods exist to realize a certain number of cycles by deadline time. This proposition is frequently appears in a hard real-time system. This paper proves two important theorems, which give the power-minimum frequency scheduling method for any types of frequency-voltage cooperative control processor, such as Vdd-control type, Vth-control type and Vdd-Vth-control type processors.

  • Low-Power Field-Programmable VLSI Using Multiple Supply Voltages

    Weisheng CHONG  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Low Power Methodology

      Page(s):
    3298-3305

    A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in field-programmable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages.

  • Trace-Driven Performance Simulation Modeling for Fast Evaluation of Multimedia Processor by Simulation Reuse

    Ho Young KIM  Tag Gon KIM  

     
    PAPER-Simulation and Verification

      Page(s):
    3306-3314

    A method for fast but yet accurate performance evaluation of processor architecture is mostly desirable in modern processors design. This paper proposes one such method which can measure cycle counts and power consumption of pipelined processors. The method first develops a trace-driven performance simulation model and then employs simulation reuse in simulation of the model. The trace-driven performance modeling is for accuracy in which performance simulation uses the same execution traces as constructed in simulation for functional verification. Fast performance simulation can be achieved in a way that performance for each instruction in the traces is evaluated without evaluation of the instruction itself. Simulation reuse supports simulation speedup by elimination of an evaluation at the current state, which is identical to that at a previous state. The reuse approach is based on the property that application programs, especially multimedia applications, have many iterative loops in general. A performance simulator for pipeline architecture based on the proposed method has been developed through which greater speedup has been made compared with other approaches in performance evaluation.

  • An Equivalence Checking Method for C Descriptions Based on Symbolic Simulation with Textual Differences

    Takeshi MATSUMOTO  Hiroshi SAITO  Masahiro FUJITA  

     
    PAPER-Simulation and Verification

      Page(s):
    3315-3323

    In this paper, an efficient equivalence checking method for two C descriptions is described. The equivalence of two C descriptions is proved by symbolic simulation. Symbolic simulation used in this paper can prove the equivalence of all of the variables in the descriptions. However, it takes long time to verify the equivalence of all of the variables if large descriptions are given. Therefore, in order to improve the verification, our method identifies textual differences between descriptions. The identified textual differences are used to reduce the number of equivalence checkings among variables. The proposed method has been implemented in C language and evaluated with several C descriptions.

  • Logic Synthesis Technique for High Speed Differential Dynamic Logic with Asymmetric Slope Transition

    Masao MORIMOTO  Yoshinori TANAKA  Makoto NAGATA  Kazuo TAKI  

     
    PAPER-Logic Synthesis

      Page(s):
    3324-3331

    This paper proposes a logic synthesis technique for asymmetric slope differential dynamic logic (ASDDL) circuits. The technique utilizes a commercially available logic synthesis tool that has been well established for static CMOS logic design, where an intermediate library is devised for logic synthesis likely as static CMOS, and then a resulting synthesized circuit is translated automatically into ASDDL implementation at the gate-level logic schematic level as well as at the physical-layout level. A design example of an ASDDL 16-bit multiplier synthesized in a 0.18-µm CMOS technology shows an operation delay time of 1.82 nsec, which is a 32% improvement over a static CMOS design with a static logic standard-cell library that is finely tuned for energy-delay products. Design with the 16-bit multiplier led to a design time for an ASDDL based dynamic digital circuit 300 times shorter than that using a fully handcrafted design, and comparable with a static CMOS design.

  • Exact Minimization of FPRMs for Incompletely Specified Functions by Using MTBDDs

    Debatosh DEBNATH  Tsutomu SASAO  

     
    PAPER-Logic Synthesis

      Page(s):
    3332-3341

    Fixed polarity Reed-Muller expressions (FPRMs) exhibit several useful properties that make them suitable for many practical applications. This paper presents an exact minimization algorithm for FPRMs for incompletely specified functions. For an n-variable function with α unspecified minterms there are 2n distinct FPRMs, and a minimum FPRM is one with the fewest product terms. To find a minimum FPRM the algorithm requires to determine an assignment of the incompletely specified minterms. This is accomplished by using the concept of integer-valued functions in conjunction with an extended truth vector and a weight vector. The vectors help formulate the problem as an assignment of the variables of integer-valued functions, which are then efficiently manipulated by using multi-terminal binary decision diagrams for finding an assignment of the unspecified minterms. The effectiveness of the algorithm is demonstrated through experimental results for code converters, adders, and randomly generated functions.

  • A Design Algorithm for Sequential Circuits Using LUT Rings

    Hiroki NAKAHARA  Tsutomu SASAO  Munehiro MATSUURA  

     
    PAPER-Logic Synthesis

      Page(s):
    3342-3350

    This paper shows a design method for a sequential circuit by using a Look-Up Table (LUT) ring. The method consists of two steps: The first step partitions the outputs into groups. The second step realizes them by LUT cascades, and allocates the cells of the cascades into the memory. The system automatically finds a fast implementation by maximally utilizing available memory. With the presented algorithm, we can easily design sequential circuits satisfying given specifications. The paper also compares the LUT ring with logic simulator to realize sequential circuits: the LUT ring is 25 to 237 times faster than a logic simulator that uses the same amount of memory.

  • An Engineering Change Orders Design Method Based on Patchwork-Like Partitioning for High Performance LSIs

    Yuichi NAKAMURA  Ko YOSHIKAWA  Takeshi YOSHIMURA  

     
    PAPER-Logic Synthesis

      Page(s):
    3351-3357

    This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.

  • Circuit Performance Prediction Considering Core Utilization with Interconnect Length Distribution Model

    Hidenari NAKASHIMA  Junpei INOUE  Kenichi OKADA  Kazuya MASU  

     
    PAPER-Prediction and Analysis

      Page(s):
    3358-3366

    Interconnect Length Distribution (ILD) represents the correlation between the number of interconnects and their length. The ILD can predict power consumption, clock frequency, chip size, etc. High core utilization and small circuit area have been reported to improve chip performance. We propose an ILD model to predict the correlation between core utilization and chip performance. The proposed model predicts the influences of interconnect length and interconnect density on circuit performances. As core utilization increases, small and simple circuits improve the performances. In large complex circuits, decreasing the wire coupling capacitance is more important than decreasing the total interconnect length for improvement of chip performance. The proposed ILD model expresses the actual ILD more accurately than conventional models.

  • Modeling the Effective Capacitance of Interconnect Loads for Predicting CMOS Gate Slew

    Zhangcai HUANG  Atsushi KUROKAWA  Jun PAN  Yasuaki INOUE  

     
    PAPER-Prediction and Analysis

      Page(s):
    3367-3374

    In deep submicron designs, predicting gate slews and delays for interconnect loads is vitally important for Static Timing Analysis (STA). The effective capacitance Ceff concept is usually used to calculate the gate delay of interconnect loads. Many Ceff algorithms have been proposed to compute gate delay of interconnect loads. However, less work has been done to develop a Ceff algorithm which can accurately predict gate slew. In this paper, we propose a novel method for calculating the Ceff of interconnect load for gate slew. We firstly establish a new expression for Ceff in 0.8Vdd point. Then the Integration Approximation method is used to calculate the value of Ceff in 0.8Vdd point. In this method, the integration of a complicated nonlinear gate output is approximated with that of a piecewise linear waveform. Based on the value of Ceff in 0.8Vdd point, Ceff of interconnect load for gate slew is obtained. The simulation results demonstrate a significant improvement in accuracy.

  • Statistical Analysis of Clock Skew Variation in H-Tree Structure

    Masanori HASHIMOTO  Tomonori YAMAMOTO  Hidetoshi ONODERA  

     
    PAPER-Prediction and Analysis

      Page(s):
    3375-3381

    This paper discusses clock skew due to manufacturing variability and environmental change. In clock tree design, transition time constraint is an important design parameter that controls clock skew and power dissipation. In this paper, we evaluate clock skew under several variability models, and demonstrate relationship among clock skew, transition time constraint and power dissipation. Experimental results show that constraint of small transition time reduces clock skew under manufacturing and supply voltage variabilities, whereas there is an optimum constraint value for temperature gradient. Our experiments in a 0.18 µm technology indicate that clock skew is minimized when clock buffer is sized such that the ratio of output and input capacitance is four.

  • On-Chip Thermal Gradient Analysis and Temperature Flattening for SoC Design

    Takashi SATO  Junji ICHIMIYA  Nobuto ONO  Koutaro HACHIYA  Masanori HASHIMOTO  

     
    PAPER-Prediction and Analysis

      Page(s):
    3382-3389

    This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic block, power density, and floorplan on thermal gradient are studied quantitatively. Temperature difference is also evaluated from timing and reliability standpoints. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.

  • A Graph Based Soft Module Handling in Floorplan

    Hiroaki ITOGA  Chikaaki KODAMA  Kunihiro FUJIYOSHI  

     
    PAPER-Floorplan and Placement

      Page(s):
    3390-3397

    In the VLSI layout design, a floorplan is often obtained to define rough arrangement of modules in the early design stage. In the stage, the aspect ratio of each soft module is also determined. The aspect ratio can be changed in the designated range keeping its area of each module. In this paper, in order to determine the aspect ratio, we propose a graph-based one dimensional compaction method which determines the aspect ratio quickly under the constraint that topology of a floorplan must not be changed. The proposed method is divided into two steps: (1) Selection of a minimal set of soft modules to adjust the aspect ratio. (2) Decision on the aspect ratio. (1) is formulated as the minimal cut problem in graph theory. We solve the problem by transforming it to the shortest path problem. (2) is divided into two operations. One is to determine the increment limit in height or width of each soft module and the other is to determine the aspect ratio of each soft module by Newton-Raphson method. The experimental comparisons show effectiveness of the proposed method.

  • An Incremental Placement Algorithm for Building Block Layout Design Based on the O-Tree Representation

    Jing LI  Juebang YU  Hiroshi MIYASHITA  

     
    PAPER-Floorplan and Placement

      Page(s):
    3398-3404

    Incremental modification and optimization in VLSI physical design is of fundamental importance. Based on the O-tree (ordered tree) representation which has more prominent advantages in comparison with other topological representations of non-slicing floorplans, in this paper, we present an incremental placement algorithm for BBL (Building Block Layout) design in VLSI physical design. The good performance of experimental results in dealing with some instances proves the effectiveness of our algorithm.

  • Navigating Register Placement for Low Power Clock Network Design

    Yongqiang LU  Chin-Ngai SZE  Xianlong HONG  Qiang ZHOU  Yici CAI  Liang HUANG  Jiang HU  

     
    PAPER-Floorplan and Placement

      Page(s):
    3405-3411

    With VLSI design development, the increasingly severe power problem requests to minimize clock routing wirelength so that both power consumption and power supply noise can be alleviated. In contrast to most of traditional works that handle this problem only in clock routing, we propose to navigate standard cell register placement to locations that enable further less clock routing wirelength and power. To minimize adverse impacts to conventional cell placement goals such as signal net wirelength and critical path delay, the register placement is carried out in the context of a quadratic placement. The proposed technique is particularly effective for the recently popular prescribed skew clock routing. Experiments on benchmark circuits show encouraging results.

  • Efficient Large Scale Integration Power/Ground Network Optimization Based on Grid Genetic Algorithm

    Yun YANG  Atsushi KUROKAWA  Yasuaki INOUE  Wenqing ZHAO  

     
    PAPER-Power/Ground Network

      Page(s):
    3412-3420

    In this paper we propose a novel and efficient method for the optimization of the power/ground (P/G) network in VLSI circuit layouts with reliability constraints. Previous algorithms in the P/G network sizing used the sequence-of-linear-programming (SLP) algorithm to solve the nonlinear optimization problems. However the transformation from nonlinear network to linear subnetwork is not optimal enough. Our new method is inspired by the biological evolution and use the grid-genetic-algorithm (GGA) to solve the optimization problem. Experimental results show that new P/G network sizes are smaller than previous algorithms, as the fittest survival in the nature. Another significant advance is that GGA method can be applied for all P/G network problems because it can get the results directly no matter whether these problems are linear or not. Thus GGA can be adopted in the transient behavior of the P/G network sizing in the future, which recently faces on the obstacles in the solution of the complex nonlinear problems.

  • Power-Supply Noise Reduction with Design for Manufacturability

    Hiroyuki TSUJIKAWA  Kenji SHIMAZAKI  Shozo HIRANO  Kazuhiro SATO  Masanori HIROFUJI  Junichi SHIMADA  Mitsumi ITO  Kiyohito MUKAI  

     
    PAPER-Power/Ground Network

      Page(s):
    3421-3428

    In the move toward higher clock rates and advanced process technologies, designers of the latest electronic products are finding increasing silicon failure with respect to noise. On the other hand, the minimum dimension of patterns on LSIs is much smaller than the wavelength of exposure, making it difficult for LSI manufacturers to obtain high yield. In this paper, we present a solution to reduce power-supply noise in LSI microchips. The proposed design methodology also considers design for manufacturability (DFM) at the same time as power integrity. The method was successfully applied to the design of a system-on-chip (SOC), achieving a 13.1-13.2% noise reduction in power-supply voltage and uniformity of pattern density for chemical mechanical polishing (CMP).

  • Successive Pad Assignment for Minimizing Supply Voltage Drop

    Takashi SATO  Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    PAPER-Power/Ground Network

      Page(s):
    3429-3436

    An efficient pad assignment methodology to minimize voltage drop on a power distribution network is proposed. A combination of successive pad assignment (SPA) with incremental matrix inversion (IMI) determines both location and number of power supply pads to satisfy drop voltage constraint. The SPA creates an equivalent resistance matrix which preserves both pad candidates and power consumption points as external ports so that topological modification due to connection or disconnection between voltage sources and candidate pads is consistently represented. By reusing sub-matrices of the equivalent matrix, the SPA greedily searches the next pad location that minimizes the worst drop voltage. Each time a candidate pad is added, the IMI reduces computational complexity significantly. Experimental results including a 400 pad problem show that the proposed procedures efficiently enumerate pad order in a practical time.

  • Evaluation of X Architecture Using Interconnect Length Distribution

    Hidenari NAKASHIMA  Naohiro TAKAGI  Junpei INOUE  Kenichi OKADA  Kazuya MASU  

     
    PAPER-Interconnect

      Page(s):
    3437-3444

    In this paper, we propose a new Interconnect Length Distribution (ILD) model to evaluate X architecture. X architecture uses 45-wire orientations in addition to 90-wire orientations, which contributes to reduce the total wire length and the number of vias. In this paper, we evaluated interconnect length distribution of diagonal (45orientations) and all-directional wiring. The average length and the longest length of interconnect are estimated, and 18% reduction in power consumption and 17% improvement in clock frequency can be obtained by the diagonal wiring in the experimental results. The all-directional wiring does not have large advantage as compared the diagonal wiring.

  • Wire Length Distribution Model for System LSI

    Takanori KYOGOKU  Junpei INOUE  Hidenari NAKASHIMA  Takumi UEZONO  Kenichi OKADA  Kazuya MASU  

     
    PAPER-Interconnect

      Page(s):
    3445-3452

    This paper concerns a new model for estimating the wire length distribution (WLD) of a system-on-a-chip (SoC). The WLD represents the correlation between wire length and the number of interconnects, and we can predict circuit performances such as power consumption, maximum clock frequency, and chip size from the WLD. A WLD model considering core utilization has been proposed, and the core utilization has a large impact on circuit performance. However, the WLD model can treat only a one-function circuit. We propose a new WLD model considering core utilization to estimate the wire length distribution of SoC, which consists of several different-function macroblocks. We present an optimization method to determine each core utilization of macroblocks.

  • Second-Order Polynomial Expressions for On-Chip Interconnect Capacitance

    Atsushi KUROKAWA  Masanori HASHIMOTO  Akira KASEBE  Zhangcai HUANG  Yun YANG  Yasuaki INOUE  Ryosuke INAGAKI  Hiroo MASUDA  

     
    PAPER-Interconnect

      Page(s):
    3453-3462

    Simple closed-form expressions for efficiently calculating on-chip interconnect capacitances are presented. The formulas are expressed with second-order polynomial functions which do not include exponential functions. The runtime of the proposed formulas is about 2-10 times faster than those of existing formulas. The root mean square (RMS) errors of the proposed formulas are within 1.5%, 1.3%, 3.1%, and 4.6% of the results obtained by a field solver for structures with one line above a ground plane, one line between ground planes, three lines above a ground plane, and three lines between ground planes, respectively. The proposed formulas are also superior in accuracy to existing formulas.

  • A Method of Precise Estimation of Physical Parameters in LSI Interconnect Structures

    Toshiki KANAMOTO  Tetsuya WATANABE  Mitsutoshi SHIROTA  Masayuki TERAI  Tatsuya KUNIKIYO  Kiyoshi ISHIKAWA  Yoshihide AJIOKA  Yasutaka HORIBA  

     
    PAPER-Interconnect

      Page(s):
    3463-3470

    This paper proposes a new non-destructive methodology to estimate physical parameters for LSIs. In order to resolve the estimation accuracy degradation issue for low-k dielectric films, we employ a parallel-plate capacitance measurement and a wire resistance measurement in our non-destructive method. Due to (1) the response surface functions corresponding to the parallel-plate capacitance measurement and the wire resistance measurement and (2) the searching of the physical parameter values using our cost function and simulated annealing, the proposed method attains higher precision than that of the existing method. We demonstrate the effectiveness of our method by application to our 90 nm SoC process using low-k materials.

  • Efficient Dummy Filling Methods to Reduce Interconnect Capacitance and Number of Dummy Metal Fills

    Atsushi KUROKAWA  Toshiki KANAMOTO  Tetsuya IBE  Akira KASEBE  Wei Fong CHANG   Tetsuro KAGE  Yasuaki INOUE  Hiroo MASUDA  

     
    PAPER-Interconnect

      Page(s):
    3471-3478

    Floating dummy metal fills inserted for planarization of multi-dielectric layers have created serious problems because of increased interconnect capacitance and the enormous number of fills. We present new dummy filling methods to reduce the interconnect capacitance and the number of dummy metal fills needed. These techniques include three ways of filling: 1) improved floating square fills, 2) floating parallel lines, and 3) floating perpendicular lines (with spacing between dummy metal fills above and below signal lines). We also present efficient formulas for estimating the appropriate spacing and number of fills. In our experiments, the capacitance increase using the conventional regular square method was 13.1%, while that using the methods of improved square fills, extended parallel lines, and perpendicular lines were 2.7%, 2.4%, and 1.0%, respectively. Moreover, the number of necessary dummy metal fills can be reduced by two orders of magnitude through use of the parallel line method.

  • On the Computational Synthesis of CMOS Voltage Followers

    Esteban TLELO-CUAUTLE  Delia TORRES-MUÑOZ  Leticia TORRES-PAPAQUI  

     
    PAPER-Circuit Synthesis

      Page(s):
    3479-3484

    A systematic method is introduced to the computational synthesis of CMOS voltage followers (VFs). The method is divided in three steps: generation of the small-signal circuitry by selection of nullators to model the behavior of a VF, and addition of norators to form nullator-norator joined-pairs; generation of the bias circuitry by addition of ideal biases according to the properties of nullators and norators; and synthesis of the joined-pairs by MOSFETs, and of the current-biases by CMOS current mirrors. It is shown that the proposed synthesis method has the capability to generate already known and new CMOS VF topologies.

  • Exact Minimum-Width Transistor Placement for Dual and Non-dual CMOS Cells

    Tetsuya IIZUKA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Circuit Synthesis

      Page(s):
    3485-3491

    This paper proposes flat and hierarchical approaches for generating a minimum-width transistor placement of CMOS cells in presence of non-dual P and N type transistors. Our approaches are the first exact method which can be applied to CMOS cells with any types of structure. Non-dual CMOS cells occupy a major part of an industrial standard-cell library. To generate the exact minimum-width transistor placement of non-dual CMOS cells, we formulate the transistor placement problem into Boolean Satisfiability (SAT) problem considering the P and N type transistors individually. Using the proposed method, the transistor placement problem of any types of CMOS cells can be solved exactly. In addition, the experimental results show that our flat approach generates smaller width placement for 29 out of 103 dual cells than that of the conventional method. Our hierarchical approach reduces the runtimes drastically. Although this approach has possibility to generate wider placements than that of the flat approach, the experimental results show that the width of only 3 out of 147 cells solved by our hierarchical approach are larger than that of the flat approach.

  • A 95 mW MPEG2 MP@HL Motion Estimation Processor Core for Portable High-Resolution Video Application

    Yuichiro MURACHI  Koji HAMANO  Tetsuro MATSUNO  Junichi MIYAKOSHI  Masayuki MIYAMA  Masahiko YOSHIMOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3492-3499

    This paper describes a 95 mW MPEG2 MP@HL motion estimation processor core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 19201080@30 fps resolution video. The search range is 12864 pixels. The ME core integrates 2.25 M transistors in 3.1 mm3.1 mm using 0.18-micron technology.

  • Quality and Power Efficient Architecture for the Discrete Cosine Transform

    Chi-Chia SUNG  Shanq-Jang RUAN  Bo-Yao LIN  Mon-Chau SHIE  

     
    PAPER-VLSI Architecture

      Page(s):
    3500-3507

    In recent years, the demand for multimedia mobile battery-operated devices has created a need for low power implementation of video compression. Many compression standards require the discrete cosine transform (DCT) function to perform image/video compression. For this reason, low power DCT design has become more and more important in today's image/video processing. This paper presents a new power-efficient Hybrid DCT architecture which combines Loeffler DCT and binDCT in terms of special property on luminance and chrominance difference. We use Synopsys PrimePower to estimate the power consumption in a TSMC 0.25-µm technology. Besides, we also adopt a novel quality assessment method based on structural distortion measurement to measure the quality instead of peak signal to noise rations (PSNR) and mean squared error (MSE). It is concluded that our Hybrid DCT offers similar quality performance to the Loeffler, and leads to 25% power consumption and 27% chip area savings.

  • VLSI Implementation of Lifting Wavelet Transform of JPEG2000 with Efficient RPA(Recursive Pyramid Algorithm) Realization

    Gab-Cheon JUNG  Seong-Mo PARK  

     
    PAPER-VLSI Architecture

      Page(s):
    3508-3515

    This paper presents an efficient VLSI architecture of biorthogonal (9,7)/(5,3) lifting based discrete wavelet transform that is used by lossy or lossless compression of JPEG2000. To improve hardware utilization of RPA (Recursive Pyramid Algorithm) implementation, we make the filter that is responsible for row operations of the first level perform both column operations and row operations of the second and following levels. As a result, the architecture has 66.7-88.9% hardware utilization. It requires 9 multipliers, 12 adders, and 12N line memories for NN image, which is smaller hardware complexity compared to that of other architectures with comparable throughput.

  • FPGA Implementation of a Stereo Matching Processor Based on Window-Parallel-and-Pixel-Parallel Architecture

    Masanori HARIYAMA  Yasuhiro KOBAYASHI  Haruka SASAKI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Page(s):
    3516-3522

    This paper presents a processor architecture for high-speed and reliable stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on the regularity of reference pixels. The stereo matching processor is implemented on an FPGA. Its performance is 80 times higher than that of a microprocessor (Pentium4@2 GHz), and is enough to generate a 3-D depth image at the video rate of 33 MHz.

  • A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation

    Zhenyu LIU  Yang SONG  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3523-3530

    Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s2s-k)tclk and the throughput is n/(s2s-ktclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18 µm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.9911.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V,25). This processor completes 1024 FFT calculation in 7.839 µs.

  • A Binary Tree Based Methodology for Designing an Application Specific Network-on-Chip (ASNOC)

    Yuan-Long JEANG  Jer-Min JOU  Win-Hsien HUANG  

     
    PAPER-VLSI Architecture

      Page(s):
    3531-3538

    In this paper, a methodology based on a mix-mode interconnection architecture is proposed for constructing an application specific network on chip to minimize the total communication time. The proposed architecture uses a globally asynchronous communication network and a locally synchronous bus (or cross-bar or multistage interconnection network MIN). First, a local bus is given for a group of IP cores so that the communications within this local bus can be arranged to be exclusive in time. If the communications of some IP cores should be required to be completed within a given amount of time, then a non-blocking MIN or a crossbar switch should be made for those IP cores instead of a bus. Then, a communication ratio (CR) for each pair of local buses is provided by users, and based on the Huffman coding philosophy, a process is applied to construct a binary tree (BT) with switches on the internal nodes and buses on the leaves. Since the binary tree system is deadlock free (no cycle exists in any path), the router is just a relatively simple and cheap switch. Simulation results show that the proposed methodology and architecture of NOC is better on switching circuit cost and performance than the SPIN and the mesh architecture using our developed deadlock-free router.

  • High-Throughput Multi-Rate Decoding of Structured Low-Density Parity-Check Codes

    Luca FANUCCI  Massimo ROVINI  Nicola E. L'INSALATA  Francesco ROSSI  

     
    PAPER-VLSI Architecture

      Page(s):
    3539-3547

    As an enhancement of the state-of-the-art solutions, a high-throughput architecture of a decoder for structured LDPC codes is presented in this paper. Thanks to the peculiar code definition and to the envisaged architecture featuring memory paging, the decoder is very flexible, and the support of different code rates is achieved with no significant hardware overhead. A top-down design flow of a real decoder is reported, starting from the analysis of the system performance in finite-precision arithmetic, up to the VLSI implementation details of the elementary modules. The synthesis of the whole decoder on 0.18µm standard cells CMOS technology showed remarkable performances: small implementation loss (0.2dB down to BER = 10-8), low latency (less than 6.0µs), high useful throughput (up to 940Mbps) and low complexity (about 375 Kgates).

  • Multiplier Energy Reduction by Dynamic Voltage Variation

    Vasily G. MOSHNYAGA  Tomoyuki YAMANAKA  

     
    PAPER-VLSI Circuit

      Page(s):
    3548-3553

    Design of portable battery operated multimedia devices requires energy-efficient multiplication circuits. This paper proposes a novel architectural technique to reduce power consumption of digital multipliers. Unlike related approaches which focus on multiplier transition activity reduction, we concentrate on dynamic reduction of supply voltage. Two implementation schemes capable of dynamically adjusting a double voltage supply to input data variation are presented. Simulations show that using these schemes we can reduce energy consumption of 1616-bit multiplier by 34% and 29% on peak and by 10% and 7% on average with area overhead of 15% and 4%, respectively, while maintaining the performance of traditional multiplier.

  • A Standard Cell-Based Frequency Synthesizer with Dynamic Frequency Counting

    Pao-Lung CHEN  Chen-Yi LEE  

     
    PAPER-VLSI Circuit

      Page(s):
    3554-3563

    This paper presents a standard cell-based frequency synthesizer with dynamic frequency counting (DFC) for multiplying input reference frequency by N times. The dynamic frequency counting loop uses variable time period to estimate and tune the frequency of digitally-controlled oscillator (DCO) which enhances frequency detection's resolution and loop stability. Two ripple counters serve as frequency estimator. Conventional phase-frequency detector (PFD) thus is replaced with a digital arithmetic comparator to yield a divider-free circuit structure. Additionally, a 15 bits DCO with the least significant bit (LSB) resolution 1.55 ps is designed by using the gate capacitance difference of 2-input NOR gate in fine-tuning stage. A modified incremental data weighted averaging (IDWA) circuit is also designed to achieve improved linearity of DCO by dynamic element matching (DEM) skill. Based on the proposed standard cell-based frequency synthesizer, a test chip is designed and verified on 0.35-µm complementary metal oxide silicon (CMOS) process, and has a frequency range of (18-214) MHz at 3.3 V with peak-to-peak (Pk-Pk) jitter of less than 70 ps at 192 MHz/3.3 V.

  • Effects of On-Chip Inductance on Power Distribution Grid

    Atsushi MURAMATSU  Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    LETTER

      Page(s):
    3564-3572

    With increase of clock frequency, on-chip wire inductance starts to play an important role in power/ground distribution analysis, although it has not been considered so far. We perform a case study work that evaluates relation between decoupling capacitance position and noise suppression effect, and we reveal that placing decoupling capacitance close to current load is necessary for noise reduction. We experimentally show that impact of on-chip inductance becomes small when on-chip decoupling capacitance is well placed according to local power consumption. We also examine influences of grid pitch, wire area, and spacing between paired power and ground wires on power supply noise. When effect of on-chip inductance on power/ground noise is significant, minification of grid pitch is more efficient than increase in wire area, and small spacing reduces power noise as we expected.

  • Perturbation Approach for Order Selections of Two-Sided Oblique Projection-Based Interconnect Reductions

    Chia-Chi CHU  Ming-Hong LAI  Wu-Shiung FENG  

     
    LETTER

      Page(s):
    3573-3576

    An order selection scheme for two-sided oblique projection-based interconnect reduction will be investigated. It will provide a guideline for terminating the conventional nonsymmetric Pade via Lanczos (PVL) iteration process. By exploring the relationship of the system Grammians of the original network and those of the reduced network, it can be shown that the system matrix of the reduced-order system generated by the two-sided oblique projection can also be expressed as those of the original interconnect model with some additive perturbations. The perturbation matrix only involves bi-orthogonal vectors at the previous step of the nonsymmetric Lanczos algorithm. This perturbation matrix will provide the stopping criteria in the order selection scheme and achieve the desired accuracy of the approximate transfer function.

  • A Simplified Illustration of Arbitrary DAC Waveform Effects in Continuous Time Delta-Sigma Modulators

    Hossein SHAMSI  Omid SHOAEI  

     
    LETTER

      Page(s):
    3577-3579

    In this paper a straight-forward approach to extract the equivalent loop-gain of a continuous time Delta-Sigma modulator with an arbitrary DAC waveform in z-domain is presented. In this approach the arbitrary DAC waveform is approximated by the infinite number of rectangular pulse shapes. Then simply using the transformations available in literatures for a rectangular DAC pulse shape and applying superposition on each rectangular pulse shape, the loop-gain of the system is derived in z-domain.

  • Frequency-Scaling Approach for Managing Power Consumption in NOCs

    Chun-Lung HSU  Wen-Tso WANG  Ying-Fu HONG  

     
    LETTER

      Page(s):
    3580-3583

    This work presents a frequency-scaling low-power (FSLP) design methodology for managing power consumption of cores in the tile-based network-on-chip (NOC) architecture. A moving picture experts group (MPEG) core is tested using the field-programmable gate array (FPGA) implementation to verify the feasibility of the proposed method. Measurement results show that about 30% power consumption can be saved in the MPEG core and reveal that the proposed FSLP design method can be suitable for cores in the tile-based NOC applications.

  • Regular Section
  • Development of Sound Localization System with Tube Earphone Using Human Head Model with Ear Canal

    Marie NAKAZAWA  Atsuhiro NISHIKATA  

     
    PAPER-Engineering Acoustics

      Page(s):
    3584-3592

    In this study, we propose a new acoustic model including the human ear canal and a thin tube earphone. The use of a tube earphone enables simultaneous listening of both virtual and real surrounding sound. First, we perform acoustic FDTD (finite difference time domain) simulations using an MRI head model with ear canals. The calculated external impedance viewed from the eardrum numerically shows that the influence of the inserted tube is small. A listening experiment with six subjects also confirms the effectiveness of a tube earphone. Second, we calculate HRTFs (head-related transfer functions) for eight directions in the horizontal plane to realize sound localization with a tube earphone. We also design inverse filters based on the propagation calculations including the characteristics of tube earphones. Finally we evaluate the localization system by another listening experiment with six subjects. The results reveal that the applicability of a system with tube earphones and inverse filters, particularly for the front directions.

  • Subband-Based Blind Separation for Convolutive Mixtures of Speech

    Shoko ARAKI  Shoji MAKINO  Robert AICHNER  Tsuyoki NISHIKAWA  Hiroshi SARUWATARI  

     
    PAPER-Engineering Acoustics

      Page(s):
    3593-3603

    We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

  • Global Asymptotic Stabilization of a Class of Nonlinear Time-Delay Systems by Output Feedback

    Ho-Lim CHOI  Jong-Tae LIM  

     
    PAPER-Systems and Control

      Page(s):
    3604-3609

    We consider the chains of integrators with nonlinear terms which allow state and input delays. We provide an output feedback controller which globally asymptotically stabilizes the given system under certain sufficient conditions. It turns out that the obtained result includes several existing results as particular cases. This point is shown through two applications of the main result. Also, industrial processes are presented to illustrate the practicability of our result.

  • A Hardware Algorithm for Modular Multiplication/Division Based on the Extended Euclidean Algorithm

    Marcelo E. KAIHARA  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Page(s):
    3610-3617

    A hardware algorithm for modular multiplication/division which performs modular division, Montgomery multiplication, and ordinary modular multiplication is proposed. The modular division in our algorithm is based on the extended Euclidean algorithm. We employ our newly proposed computation method that consists of processing the multiplier from the most significant digit first to calculate Montgomery multiplication. Finally, the ordinary modular multiplication is based on shift-and-add multiplication. Each of these three operations is carried out through the iteration of simple operations such as shifts and additions/subtractions. To avoid carry propagation in all additions and subtractions, the radix-2 signed-digit representation is employed. A modular multiplier/divider based on the algorithm has a linear array structure with a bit-slice feature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implemented using a hardware amount only slightly larger than that of the modular divider.

  • Bayesian Approach to Optimal Release Policy of Software System

    HeeSoo KIM  Shigeru YAMADA  DongHo PARK  

     
    PAPER-Reliability, Maintainability and Safety Analysis

      Page(s):
    3618-3626

    In this paper, we propose a new software reliability growth model which is the mixture of two exponential reliability growth models, one of which has the reliability growth and the other one does not have the reliability growth after the software is released upon completion of testing phase. The mixture of two such models is characterized by a weighted factor p, which is the proportion of reliability growth part within the model. Firstly, this paper discusses an optimal software release problem with regard to the expected total software cost incurred during the warranty period under the proposed software reliability growth model, which generalizes Kimura, Toyota and Yamada's (1999) model with consideration of the weighted factor. The second main purpose of this paper is to apply the Bayesian approach to the optimal software release policy by assuming the prior distributions for the unknown parameters contained in the proposed software reliability growth model. Some numerical examples are presented for the purpose of comparing the optimal software release policies depending on the choice of parameters by the non-Bayesian and Bayesian methods.

  • Some Classes of Quasi-Cyclic LDPC Codes: Properties and Efficient Encoding Method

    Hachiro FUJITA  Kohichi SAKANIWA  

     
    PAPER-Coding Theory

      Page(s):
    3627-3635

    Low-density parity-check (LDPC) codes are one of the most promising next-generation error-correcting codes. For practical use, efficient methods for encoding of LDPC codes are needed and have to be studied. However, it seems that no general encoding methods suitable for hardware implementation have been proposed so far and for randomly constructed LDPC codes there have been no other methods than the simple one using generator matrices. In this paper we show that some classes of quasi-cyclic LDPC codes based on circulant permutation matrices, specifically LDPC codes based on array codes and a special class of Sridhara-Fuja-Tanner codes and Fossorier codes can be encoded by division circuits as cyclic codes, which are very easy to implement. We also show some properties of these codes.

  • Bounds on Aperiodic Autocorrelation and Crosscorrelation of Binary LCZ/ZCZ Sequences

    Daiyuan PENG  Pingzhi FAN  Naoki SUEHIRO  

     
    PAPER-Spread Spectrum Technologies and Applications

      Page(s):
    3636-3644

    In order to eliminate the co-channel and multi-path interference of quasi-synchronous code division multiple access (QS-CDMA) systems, spreading sequences with low or zero correlation zone (LCZ or ZCZ) can be used. The significance of LCZ/ZCZ to QS-CDMA systems is that, even there are relative delays between the transmitted spreading sequences due to the inaccurate access synchronization and the multipath propagation, the orthogonality (or quasi-orthogonality) between the transmitted signals can still be maintained, as long as the relative delay does not exceed certain limit. In this paper, several lower bounds on the aperiodic autocorrelation and crosscorrelation of binary LCZ/ZCZ sequence set with respect to the family size, sequence length and the aperiodic low or zero correlation zone, are derived. The results show that the new bounds are tighter than previous bounds for the LCZ/ZCZ sequences.

  • Avoiding the Local Minima Problem in Backpropagation Algorithm with Modified Error Function

    Weixing BI  Xugang WANG  Zheng TANG  Hiroki TAMURA  

     
    PAPER-Neural Networks and Bioengineering

      Page(s):
    3645-3653

    One critical "drawback" of the backpropagation algorithm is the local minima problem. We have noted that the local minima problem in the backpropagation algorithm is usually caused by update disharmony between weights connected to the hidden layer and the output layer. To solve this kind of local minima problem, we propose a modified error function with two terms. By adding one term to the conventional error function, the modified error function can harmonize the update of weights connected to the hidden layer and those connected to the output layer. Thus, it can avoid the local minima problem caused by such disharmony. Simulations on some benchmark problems and a real classification task have been performed to test the validity of the modified error function.

  • On Linear Least Squares Approach for Phase Estimation of Real Sinusoidal Signals

    Hing-Cheung SO  

     
    LETTER-Digital Signal Processing

      Page(s):
    3654-3657

    In this Letter, linear least squares (LLS) techniques for phase estimation of real sinusoidal signals with known or unknown amplitudes are studied. It is proved that the asymptotic performance of the LLS approach attains Cramér-Rao lower bound. For the case of a single tone, a novel LLS algorithm with unit-norm constraint is derived. Simulation results are also included for algorithm evaluation.

  • On the Property of a Discrete Impulse Response Gramian with Application to Model Reduction

    Younseok CHOO  

     
    LETTER-Systems and Control

      Page(s):
    3658-3660

    It has been observed in the literature that the characteristic polynomial of a discrete system can be computed from the characteristic impulse response Gramian. In this letter it is shown that a given characteristic impulse response Gramian, in fact, contains information on two characteristic polynomials. The importance of this result is illustrated through an application to model reduction of discrete systems.

  • A Note on the Implementation of de Bruijn Networks by the Optical Transpose Interconnection System

    Kohsuke OGATA  Toshinori YAMADA  Shuichi UENO  

     
    LETTER-Graphs and Networks

      Page(s):
    3661-3662

    This note shows an efficient implementation of de Bruijn networks by the Optical Transpose Interconnection System (OTIS) extending previous results by Coudert, Ferreira, and Perennes [2].

  • Stego-Encoding with Error Correction Capability

    Xinpeng ZHANG  Shuozhong WANG  

     
    LETTER-Information Security

      Page(s):
    3663-3667

    Although a proposed steganographic encoding scheme can reduce distortion caused by data hiding, it makes the system susceptible to active-warden attacks due to error spreading. Meanwhile, straightforward application of error correction encoding inevitably increases the required amount of bit alterations so that the risk of being detected will increase. To overcome the drawback in both cases, an integrated approach is introduced that combines the stego-encoding and error correction encoding to provide enhanced robustness against active attacks and channel noise while keeping good imperceptibility.

  • Complexity of Differential Attacks on SHA-0 with Various Message Schedules

    Mitsuhiro HATTORI  Shoichi HIROSE  Susumu YOSHIDA  

     
    LETTER-Information Security

      Page(s):
    3668-3671

    The security of SHA-0 with various message schedules is discussed in this letter. SHA-0 employs a primitive polynomial of degree 16 over GF(2) in its message schedule. For each primitive polynomial, a SHA-0 variant can be constructed. The collision resistance and the near-collision resistance of SHA-0 variants to the Chabaud-Joux attack are evaluated. Moreover, the near-collision resistance of a variant to the Biham-Chen attack is evaluated. It is shown that the selection of primitive polynomials highly affects the resistance. However, it is concluded that these SHA-0 variants are not appropriate for making SHA-0 secure.

  • A Step-by-Step Implementation Method of the Bit-Serial Reed-Solomon Encoder

    Jinsoo BAE  Hiroyuki MORIKAWA  

     
    LETTER-Coding Theory

      Page(s):
    3672-3674

    The Reed-Solomon code is a versatile channel code pervasively used for communication and storage systems. The bit-serial Reed-Solomon encoder has a simple structure, although it is somewhat difficult to understand the algorithm without considerable theoretical background. Some professionals and students, not able to understand the algorithm thoroughly, might need to implement the bit-serial encoder for themselves. In this letter, a step-by-step method is presented for the implementation of the bit-serial encoder even without understanding the internal algorithm, which would be helpful for VHDL, DSP, and simulation programming.

  • Properties of m-Sequence and Construction of Constant Weight Codes

    Fanxin ZENG  

     
    LETTER-Coding Theory

      Page(s):
    3675-3676

    In the letter, properties of m-sequence are derived, based on these properties, a family of binary nonlinear constant weight codes is presented, these binary nonlinear constant weight codes can apply to automatic repeat request (ARQ) communication system, as detecting-error codes.

  • An Efficient Software-Defined Radio Architecture for Multi-Mode WCDMA Applications

    Jaesang LIM  Yongchul SONG  Jeongpyo KIM  Beomsup KIM  

     
    LETTER-General Fundamentals and Boundaries

      Page(s):
    3677-3680

    This letter describes an efficient architecture for a Software Defined Radio (SDR) Wideband Code Division Multiple Access (WCDMA) receiver using for high performance wireless communication systems. The architecture is composed of a Radio Frequency (RF) front-end, an Analog-to-Digital Converter (ADC), and a Quadrature Amplitude Modulation (QAM) demodulator. A coherent demodulator, with a complete digital synchronization scheme, achieves the bit-error rate (BER) of 10-6 with the implementation loss of 0.5 dB for a raw Quadrature Phase Shift King (QPSK) signal.