The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.48

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E86-A No.12  (Publication Date:2003/12/01)

    Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Masaharu IMAI  

     
    FOREWORD

      Page(s):
    2913-2913
  • Statistical Gate-Delay Modeling with Intra-Gate Variability

    Kenichi OKADA  Kento YAMAOKA  Hidetoshi ONODERA  

     
    PAPER-Parasitics and Noise

      Page(s):
    2914-2922

    This paper proposes a model to calculate statistical gate-delay variation caused by intra-chip and inter-chip variabilities. The variation of each gate delay directly influences the circuit-delay variation, so it is important to characterize each gate-delay variation accurately. Every transistor in a gate affects transient characteristics of the gate, so it is indispensable to consider an intra-gate variability for the modeling of gate-delay variation. This effect is not captured in a statistical delay analysis reported so far. Our model considers the intra-gate variability by sensitivity constants. We evaluate our modeling accuracy, and we show some simulated results of a circuit delay variation.

  • Parasitic Capacitance Modeling for Non-Planar Interconnects in Liquid Crystal Displays

    Sadahiro TANI  Yoshihiro UCHIDA  Makoto FURUIE  Shuji TSUKIYAMA  BuYeol LEE  Shuji NISHI  Yasushi KUBOTA  Isao SHIRAKAWA  Shigeki IMAI  

     
    PAPER-Parasitics and Noise

      Page(s):
    2923-2932

    The problem of calculating parasitic capacitances between two interconnects is investigated dedicatedly for liquid crystal displays, with the main focus put on the approximate expressions of the capacitances caused at the intersection and the parallel running of two interconnects. To derive simple and accurate approximate expressions, the interconnects in these structures are divided into a few basic coupling regions in such a way that the electro-magnetic field in each region can be calculated by a 2-D capacitance model. Then the capacitance in such a region is represented by a simple expression adjusted to the results computed by an electro-magnetic field solver. The total capacitance obtained by summing the capacitances in all regions is evaluated in comparison with the one obtained by using a 3-D field solver, resulting in a relative error of less than 5%.

  • Approximation Formula Approach for the Efficient Extraction of On-Chip Mutual Inductances

    Atsushi KUROKAWA  Takashi SATO  Hiroo MASUDA  

     
    PAPER-Parasitics and Noise

      Page(s):
    2933-2941

    We present a new and efficient approach for extracting on-chip mutual inductances of VLSI interconnects by applying approximation formulae. The equations are based on the assumption of filaments or bars of finite width and zero thickness and are derived through Taylor's expansion of the exact formula for mutual inductance between filaments. Despite the assumption of uniform current density in each of the bars, the model is sufficiently accurate for the interconnections of current and future LSIs because the skin and proximity effects do not affect most wires. Expression of the equations in polynomial form provides a balance between accuracy and computational complexity. These equations are mapped according to the geometric structures for which they are most suitable in minimizing the runtime of inductance calculation while retaining the required accuracy. Within geometrical constraints, the wires are of arbitrary specification. Results of a comprehensive evaluation based on the ITRS-specified global wiring structure for 2003 shows that the inductance values were extracted by using the proposed approach, and they were within several percent of the values obtained by using commercial three-dimensional (3-D) field solvers. The efficiency of the proposed approach is also demonstrated by extraction from a real layout design that has 300-k interconnecting segments.

  • Representative Frequency for Interconnect R(f)L(f)C Extraction

    Akira TSUCHIYA  Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    PAPER-Parasitics and Noise

      Page(s):
    2942-2951

    This paper discusses the frequency to extract RLC values from interconnects. In circuit design, frequency-independent equivalent circuit is widely used, and many design and analysis techniques based on this equivalent circuit are proposed so far. However in reality, characteristics of interconnects are frequency-dependent. Also pulse waveforms in digital circuits contain multiple frequency components. The frequency used for RLC extraction affects the accuracy of interconnect characterization, and hence careful determination of extraction frequency is critical. We propose a representative frequency for RLC extraction. Conventionally, representative frequencies are determined by input pulse. The proposed method decides the representative frequency based on the interconnect length, whereas conventional representative frequencies are determined by input pulse shape, period and patterns. We verify that the extraction at the proposed frequency provides the most accurate transition waveform against various input signals and interconnect structures in digital circuits.

  • Moment Computations of Lumped Coupled RLC Trees with Applications to Estimating Crosstalk Noise

    Herng-Jer LEE  Chia-Chi CHU  Wu-Shiung FENG  

     
    PAPER-Parasitics and Noise

      Page(s):
    2952-2964

    A novel method is presented to compute moments of high-speed VLSI interconnects, which are modeled as coupled RLC trees. Recursive formulae of moments of coupled RC trees are extended to those for coupled RLC trees by considering both self inductances and mutual inductances. Analytical formulae for voltage moments at each node are derived explicitly. The formulae can be efficiently used for estimating delay and crosstalk noise. The inductive crosstalk noise waveform can be accurately and efficiently estimated using the moment computation technique in conjunction with the projection-based order reduction method. Fundamental aspects of the proposed approach are described in details. Experimental results show the increased accuracy of the proposed method over that of the traditional ones.

  • Crosstalk Noise Estimation for Generic RC Trees

    Masanori HASHIMOTO  Masao TAKAHASHI  Hidetoshi ONODERA  

     
    PAPER-Parasitics and Noise

      Page(s):
    2965-2973

    We propose an estimation method of crosstalk noise for generic RC trees. The proposed method derives an analytic waveform of crosstalk noise in a 2-π equivalent circuit. The peak voltage is calculated from the closed-form expression. We also develop a transformation method from generic RC trees with branches into the 2-π model circuit. The proposed method can hence estimate crosstalk noise for any RC trees. Our estimation method is evaluated in a 0.13 µm technology. The peak noise of two partially-coupled interconnects is estimated with the average error of 11%. Our method transforms generic RC interconnects with branches into the 2-π model with 14% error on average.

  • A Design Methodology for Low EMI Noise LSI with Fast and Accurate Estimation

    Hiroyuki TSUJIKAWA  Shozo HIRANO  Kenji SHIMAZAKI  

     
    PAPER-Parasitics and Noise

      Page(s):
    2974-2982

    Large-scale integration (LSI) microchips are widely used in many types of modern electronic products including electric appliances, cellular phones, toys, electronic games, and automobiles. The electromagnetic interference (EMI) noise produced by these micro devices can cause significant operational problems in other devices in the system. Some methods that have been proposed for such analysis estimates the EMI noise characteristic through transistor-level power simulation. However, in these methods, transistor-level circuit simulation is performed by combining the power-supply impedance model and the power-supply source model. In general, transistor-level simulators are too slow for practical application-specific integrated circuit (ASIC) design. In this paper, a total solution for reducing EMI noise in LSI microchips was presented. The proposed design methodology integrates fast and accurate estimation, reduction, and verification. The method was successfully applied to the design of a 32-bit microprocessor, achieving a 2-dB noise reduction in the FM frequency band and 10-dB reduction at 1 GHz. The proposed design methodology is a powerful solution for LSI designers as a tool for minimizing EMI noise and achieve higher levels of reliability for the microelectronic products.

  • Variable Pipeline Depth Processor for Energy Efficient Systems

    Akihiko HYODO  Masanori MUROYAMA  Hiroto YASUURA  

     
    PAPER-Power Optimization

      Page(s):
    2983-2990

    This paper presents a variable pipeline depth processor, which can dynamically adjust its pipeline depth and operating voltage at run-time, we call dynamic pipeline and voltage scaling (DPVS), depending on the workload characteristics under timing constraints. The advantage of adjusting pipeline depth is that it can eliminate the useless energy dissipation of the additional stalls, or NOPs and wrong-path instructions which would increase as the pipeline depth grow deeper in excess of the inherent parallelism. Although dynamic voltage scaling (DVS) is a very effective technique in itself for reducing energy dissipation, lowering supply voltage also causes performance degradation. By combining with dynamic pipeline scaling (DPS), it would be possible to retain performance at required level while reducing energy dissipation much further. Experimental results show the effectiveness of our DPVS approach for a variety of benchmarks, reducing total energy dissipation by up to 64.90% with an average of 27.42% without any effect on performance, compared with a processor using only DVS.

  • A Low Power Embedded DRAM Macro for Battery-Operated LSIs

    Takeshi FUJINO  Akira YAMAZAKI  Yasuhiko TAITO  Mitsuya KINOSHITA  Fukashi MORISHITA  Teruhiko AMANO  Masaru HARAGUCHI  Makoto HATAKENAKA  Atsushi AMO  Atsushi HACHISUKA  Kazutami ARIMOTO  Hideyuki OZAKI  

     
    PAPER-Power Optimization

      Page(s):
    2991-3000

    A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (apacitor nder it-line) DRAM cell is newly developed for this process. Novel start-up and dynamic fuse-data loading circuit are developed to realize easy customization of memory capacities with minimum area penalty. A new write-mask control circuit using write-gate sense-amplifier is adopted in order to apply column shift-redundancy circuit. Various low power technologies including unique "non-precharge read-data bus" method are applied. In the test-chip adopting new process-technology and three original circuit-design techniques, random column operation of 166 MHz and data retention power of 123 µW are demonstrated at 1.5 V power supply.

  • Irredundant Low Power Address Bus Encoding Techniques Based on Adaptive Codebooks

    Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER-Power Optimization

      Page(s):
    3001-3008

    The power dissipation at the off-chip bus has become a significant part of the overall power dissipation in micro-processor based digital systems. This paper presents irredundant address bus encoding methods which reduce signal transitions on the instruction address buses by using adaptive codebook methods. These methods are based on the temporal locality and spatial locality of instruction address. Since applications tend to JUMP/BRANCH to limited sets of addresses, proposed encoding methods assign the least signal transition codes to the addresses of JUMP/BRANCH operations in the past. In addition, our methods can be easily applicable for conventional digital systems since they are irredundant encoding methods. Our encoding methods reduce the signal transitions on the instruction address buses, which results in the reduction of total power dissipation of digital systems. Experimental results show that our methods can reduce the signal transition by an average of 88%.

  • Counter Tree Diagrams: A Unified Framework for Analyzing Fast Addition Algorithms

    Jun SAKIYAMA  Naofumi HOMMA  Takafumi AOKI  Tatsuo HIGUCHI  

     
    PAPER-IP Design

      Page(s):
    3009-3019

    This paper presents a unified representation of fast addition algorithms based on Counter Tree Diagrams (CTDs). By using CTDs, we can describe and analyze various adder architectures in a systematic way without using specific knowledge about underlying arithmetic algorithms. Examples of adder architectures that can be handled by CTDs include Redundant-Binary (RB) adders, Signed-Digit (SD) adders, Positive-Digit (PD) adders, carry-save adders, parallel counters (e.g., 3-2 counters and 4-2 counters) and networks of such basic adders/counters. This paper also discusses the CTD-based analysis of carry-propagation-free adders using various number representations.

  • Development of an IP Library of IEEE-754-Standard Single-Precision Floating-Point Dividers

    Hiroyuki OCHI  Tatsuya SUZUKI  Sayaka MATSUNAGA  Yoichi KAWANO  Takao TSUDA  

     
    PAPER-IP Design

      Page(s):
    3020-3027

    Floating-point units (FPUs) are indispensable in processors, 3D-graphic engines, etc. To improve design productivity of these LSIs, FPU IPs are strongly desired. However, it is impossible to cover wide range of needs by an FPU IP, because there are various kind of options in specifications (e.g., operating frequency, latency, and ability of pipeline operation) and implementations (e.g., hardware algorithms). Thus, multiple IPs are needed even for the same functionality. In this paper, we propose to build an IP Library which consists of large number of FPU IPs with various kind of specifications and implementations, and which has catalogue data that shows not only specifications but also post-layout area and power dissipation of each IP. As the first step of the project, we have developed an IP Library targeted to Rohm 0.35 µm triple-metal process, which consists of 20 IPs for IEEE-754-standard single-precision floating-point division with 5 operating frequencies (50 MHz, 75 MHz, 100 MHz, 125 MHz, and 150 MHz), with two options whether pipelined or not, and with two hardware algorithms (the restoring method and the SRT method). We have also developed a catalogue for the IP Library, which shows post-layout area and power dissipation as well as specification of each IP. We have introduced two metrics "performance-area ratio (MFLOPS/mm2)" and "performance-power ratio (MFLOPS/W)" to afford a good insight into efficiency of implementations. From the catalogue data, the restoring method is, on the average, 1.4 times and 2.3 times better than the SRT method in terms of performance-area ratio and performance-power ratio, respectively. The developed catalogue is usable not only for selection of the optimal IP for a specific application, but also for quantitative analysis at the early stage of architecture design. It is also expected that the catalogue data based on an actual process technology is valuable for education.

  • Synthesis of Serial Local Clock Controllers for Asynchronous Circuit Design

    Nattha SRETASEREEKUL  Hiroshi SAITO  Euiseok KIM  Metehan OZCAN  Masashi IMAI  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-IP Design

      Page(s):
    3028-3037

    Asynchronous controllers effectively control high concurrence of datapath operations for high speed. Signal Transition Graphs (STGs) can effectively represent these concurrent events. However, highly concurrent STGs cause the state explosion problem in asynchronous synthesis tools. Many small but highly concurrent STGs cannot be synthesized to obtain control circuits. Moreover, STGs also lead to some control-time overhead of the four-phase handshake protocol. In this paper, we propose a method for deriving the serial control nodes from Control Data Flow Graphs (CDFGs) such that the concurrence of datapath operations is still preserved. The STGs derived from the serialized control nodes are serial STGs which are simpler for synthesis than the concurrent STGs. We also propose an implementation using these serialized controllers to generate local clocks at any necessary times. The implementation results in very small control-time overhead. The experimental results show that the number of synthesis states is proportional to the number of control signals, and the circuits with satisfiable small control-time overhead are obtained.

  • Critical Path Selection for Deep Sub-Micron Delay Test and Timing Validation

    Jing-Jia LIOU  Li-C. WANG  Angela KRSTIĆ  Kwang-Ting (Tim) CHENG  

     
    PAPER-Timing Verification and Test Generation

      Page(s):
    3038-3048

    Critical path selection is an indispensable step for AC delay test and timing validation. Traditionally, this step relies on the construction of a set of worse-case paths based upon discrete timing models. However, the assumption of discrete timing models can be invalidated by timing defects and process variation in the deep sub-micron domain, which are often continuous in nature. As a result, critical paths defined in a traditional timing analysis approach may not be truly critical in reality. In this paper, we propose using a statistical delay evaluation framework for estimating the quality of a path set. Based upon the new framework, we demonstrate how the traditional definition of a critical path set may deviate from the true critical path set in the deep sub-micron domain. To remedy the problem, we discuss improvements to the existing path selection strategies by including new objectives. We then compare statistical approaches with traditional approaches based upon experimental analysis of both defect-free and defect-injected cases.

  • DFT Timing Design Methodology for Logic BIST

    Yasuo SATO  Motoyuki SATO  Koki TSUTSUMIDA  Kazumi HATAYAMA  Kazuyuki NOMOTO  

     
    PAPER-Timing Verification and Test Generation

      Page(s):
    3049-3055

    We analyze the timing design methodology for testing chips using a multiple-clock domain scheme. We especially focus on the layout design of the design-for-test (DFT) circuits and the clock network. First, we demonstrate the built-in-self-testing (BIST) scheme for multiple-clock domains. Then, we discuss the layout method that achieves a low clock-skew between different clock domains with a small modification of the original user logic layout. Finally, we evaluate the fault coverage of our large ASIC chips designed using our new methodology. The short design period and high fault coverage of our methodology are confirmed using actual industrial designs. We introduce a viable approach for industrial designs because designers don't have to pay much attention to DFT. Our approach also provides designers with an easy method for LSI debugging and diagnostics.

  • A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation

    Youhua SHI  Zhe ZHANG  Shinji KIMURA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Timing Verification and Test Generation

      Page(s):
    3056-3062

    Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.

  • Seed Selection Procedure for LFSR-Based Random Pattern Generators

    Kenichi ICHINO  Ko-ichi WATANABE  Masayuki ARAI  Satoshi FUKUMOTO  Kazuhiko IWASAKI  

     
    PAPER-Timing Verification and Test Generation

      Page(s):
    3063-3071

    We propose a technique of selecting seeds for the LFSR-based test pattern generators that are used in VLSI BISTs. By setting the computed seed as an initial value, target fault coverage, for example 100%, can be accomplished with minimum test length. We can also maximize fault coverage for a given test length. Our method can be used for both test-per-clock and test-per-scan BISTs. The procedure is based on vector representations over GF(2m), where m is the number of LFSR stages. The results indicate that test lengths derived through selected seeds are about sixty percent shorter than those derived by simple seeds, i.e. 0001, for a given fault coverage. We also show that seeds obtained through this technique accomplish higher fault coverage than the conventional selection procedure. In terms of the c7552 benchmark, taking a test-per-scan architecture with a 20-bit LFSR as an example, the number of undetected faults can be decreased from 304 to 227 for 10,000 LFSR patterns using our proposed technique.

  • A Method of Test Generation for Acyclic Sequential Circuits Using Single Stuck-at Fault Combinational ATPG

    Hideyuki ICHIHARA  Tomoo INOUE  

     
    PAPER-Timing Verification and Test Generation

      Page(s):
    3072-3078

    A test generation method with time-expansion model can achieve high fault efficiency for acyclic sequential circuits, which can be obtained by partial scan design. This method, however, requires combinational test pattern generation algorithm that can deal with multiple stuck-at faults, even if the target faults are single stuck-at faults. In this paper, we propose a test generation method for acyclic sequential circuits with a circuit model, called MS-model, which can express multiple stuck-at faults in time-expansion model as single stuck-at faults. Our procedure can generate test sequences for acyclic sequential circuits with just combinational test pattern generation algorithm for single stuck-at faults. Experimental results show that test sequences for acyclic sequential circuits with high fault efficiency are generated in small computational effort.

  • Implementation of Java Accelerator for High-Performance Embedded Systems

    Motoki KIMURA  Morgan Hirosuke MIKI  Takao ONOYE  Isao SHIRAKAWA  

     
    PAPER-Simulation Accelerator

      Page(s):
    3079-3088

    A Java execution environment is implemented, in which a hardware engine is operated in parallel with an embedded processor. This pair of hardware facilities together with an additional software kernel are devised for existing embedded systems, so as to execute Java applications more efficiently in such a way that 39 instructions are added to the original Java Virtual Machine to implement the software kernel. The exploration of design parameters is also attempted to attain a low hardware cost and high performance. The proposed hardware engine of a 6-stage pipeline can be integrated in a single chip using 30 k gates together with the instruction and data cache memories. The proposed approach improves the execution speed by a factor of 5 in comparison with the J2ME software implementation.

  • Top-Down Retargetable Framework with Token-Level Design for Accelerating Simulation Speed of Processor Architecture

    Jun Kyoung KIM  Ho Young KIM  Tag Gon KIM  

     
    PAPER-Simulation Accelerator

      Page(s):
    3089-3098

    This paper proposes a retargetable framework for rapid evaluation of processor architecture, which represents abstraction levels of architecture in a hierarchical manner. The basis for such framework is a hierarchical architecture description language, called XR2, which describes architecture at three abstraction levels: instruction set architecture, pipeline architecture and micro-architecture. In addition, a token-level computational model for fast pipeline simulation is proposed, which considers the minimal information required for the given performance measurement of the pipeline. Experimental result shows that token-level simulation is faster than the traditional cycle-accurate one by 50% to 80% in pipeline architecture evaluation.

  • A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions

    Nozomu TOGAWA  Kyosuke KASAHARA  Yuichiro MIYAOKA  Jinku CHOI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Simulation Accelerator

      Page(s):
    3099-3109

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SIMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD sub-operations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • Efficient DDD-Based Interpretable Symbolic Characterization of Large Analog Circuits

    Sheldon X.-D. TAN  C.-J. Richard SHI  

     
    PAPER-Analog Design

      Page(s):
    3110-3118

    A systematic and efficient approach is presented to generating simple yet accurate symbolic expressions for transfer functions and characteristics of large linear analog circuits. The approach is based on a compact determinant decision diagram (DDD) representation of exact transfer functions and characteristics. Several key tasks of generating interpretable symbolic expressions--DDD graph simplification, term de-cancellation, and dominant-term generation--are shown to be able to perform linearly by means of DDD graph operations. An efficient algorithm for generating dominant terms is presented based on the concepts of finding the k-shortest paths in a DDD graph. Experimental results show that our approach outperforms other start-of-the-art approaches, and is capable of generating interpretable expressions for typical analog blocks in minutes on modern computer workstations.

  • A Fully Independently Adjustable, Integrable Simple Current Controlled Oscillator and Derivative PWM Signal Generator

    Montree SIRIPRUCHYANUN  Paramote WARDKEIN  

     
    PAPER-Analog Design

      Page(s):
    3119-3126

    A simple circuit scheme, able to generate square/triangular wave, is proposed. Its advantages are that oscillation frequency and amplitudes of the proposed circuit do have a small range of temperature drift. Electronic adjustments of them can be obtained with a wide sweep range and DC offset adjustment available. In addition, the proposed scheme can produce frequency-constant derivative of PWM signal without an additional device requirement. It is very appropriate for, with simple circuit details, not only circuit implementation but also monolithic fabrication. The PSPICE simulation results through bipolar technology are given here, they show good performance of the proposed circuit.

  • Red-Black Interval Trees in Device-Level Analog Placement

    Sarat C. MARUVADA  Karthik KRISHNAMOORTHY  Florin BALASA  Lucian M. IONESCU  

     
    PAPER-Analog Design

      Page(s):
    3127-3135

    The traditional way of approaching device-level placement problems for analog layout is to explore a huge search space of absolute placement representations, where cells are allowed to illegally overlap during their moves. This paper presents a novel exploration technique for analog placement, operating on a subset of tree representations of the layout, where the typical presence of an arbitrary number of symmetry groups of devices is directly taken into account during the search of the solution space. The efficiency of the novel approach is due to the use of red-black interval trees, data structures employed to support operations on dynamic sets of intervals.

  • VLSI Module Placement with Pre-Placed Modules and with Consideration of Congestion Using Solution Space Smoothing

    Sheqin DONG  Xianlong HONG  Song CHEN  Xin QI  Ruijie WANG  Jun GU  

     
    PAPER-Place and Routing

      Page(s):
    3136-3147

    Solution space smoothing allows a local search heuristic to escape from a poor, local minimum. In this paper, we propose a technique that can smooth the rugged terrain surface of the solution space of a placement problem. We test the smoothing heuristics for MCNC benchmarks, and for VLSI placement with pre-placed modules and placement with consideration of congestion. Experiment results demonstrated that solution space smoothing is very efficient for VLSI module placement, and it can be applied to all floorplanning representations proposed so far.

  • An Improved Method of Convex Rectilinear Block Packing Based on Sequence-Pair

    Kazuya WAKATA  Hiroaki SAITO  Kunihiro FUJIYOSHI  Keishi SAKANUSHI  Takayuki OBATA  Chikaaki KODAMA  

     
    PAPER-Place and Routing

      Page(s):
    3148-3157

    In this paper, for convex rectilinear block packing problem, we propose 1) a novel algorithm to obtain a packing based on a given sequence-pair in O(n2) time (conventional method needs O(n3) time), where n is the number of rectangle sub-blocks made from convex blocks, 2) a move operation for Simulated Annealing which is symmetric and can guarantee reachability for the first time, and 3) a method to generate a random adjacent sequence-pair in O(n2) time. By using 1), 2) and 3) together, the time complexity of the inner loop in Simulated Annealing becomes surely O(n2) time. Experimental results show that the proposed algorithm is faster than the conventional ones in practical and the wire length as well as packing area is taken into consideration in the proposed method.

  • A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design

    Jingyu XU  Xianlong HONG  Tong JING  Yici CAI  Jun GU  

     
    PAPER-Place and Routing

      Page(s):
    3158-3167

    As the CMOS technology enters the very deep submicron era, inter-wire coupling capacitance becomes the dominant part of load capacitance. The coupling effects have brought new challenges to routing algorithms on both delay estimation and optimization. In this paper, we propose a timing-driven global routing algorithm with consideration of coupling effects. Our two-phase algorithm based on timing-relax method includes a heuristic Steiner tree algorithm to guarantee the timing performance of the initial solution and an optimization algorithm based on coupling-effect-transference. Experimental results are given to demonstrate the efficiency and accuracy of the algorithm.

  • Compact Representations of Logic Functions Using Heterogeneous MDDs

    Shinobu NAGAYAMA  Tsutomu SASAO  

     
    PAPER-Logic and High Level Synthesis

      Page(s):
    3168-3175

    In this paper, we propose a compact representation of logic functions using Multi-valued Decision Diagrams (MDDs) called heterogeneous MDDs. In a heterogeneous MDD, each variable may take a different domain. By partitioning binary input variables and representing each partition as a single multi-valued variable, we can produce a heterogeneous MDD with 16% smaller memory size than a Reduced Ordered Binary Decision Diagram (ROBDD), and with comparable memory size to Free Binary Decision Diagrams (FBDDs). And also, heterogeneous MDDs have shorter Average Path Length (APL) than ROBDDs and FBDDs. We minimized a large number of benchmark functions to show the compactness of heterogeneous MDDs.

  • Multi-Cycle Path Detection for Sequential Circuits and Its Application to Real Designs

    Hiroyuki HIGUCHI  

     
    PAPER-Logic and High Level Synthesis

      Page(s):
    3176-3183

    This paper proposes a fast multi-cycle path detection method for large sequential circuits. The proposed method is based on ATPG techniques, especially on implication techniques, to use circuit structures and multi-cycle path conditions directly. The method also checks whether or not a multi-cycle path may be invalidated by static hazards at the inputs of flip-flops. Then we explain how to apply the proposed algorithm to real industrial designs. Experimental results show that our method is much faster than conventional ones and that it is efficient enough to handle large industrial designs.

  • Bit Length Optimization of Fractional Part on Floating to Fixed Point Conversion for High-Level Synthesis

    Nobuhiro DOI  Takashi HORIYAMA  Masaki NAKANISHI  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER-Logic and High Level Synthesis

      Page(s):
    3184-3191

    In the hardware synthesis from a high-level language such as C, the bit length of variables is one of the key issues for the area and speed optimization. Usually, designers are required to optimize the bit-length of each variable manually using the time-consuming simulation on huge-data. In this paper, we propose an optimization method of the fractional bit length in the conversion from floating-point variables to fixed-point variables. The method is based on error propagation and the backward propagation of the accuracy limitation. The method is fully analytical and fast compared to simulation based methods.

  • Verification of Synchronization in SpecC Description with the Use of Difference Decision Diagrams

    Thanyapat SAKUNKONCHAK  Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER-Logic and High Level Synthesis

      Page(s):
    3192-3199

    SpecC language is designated to handle the design of entire system from specification to implementation and of hardware/software co-design. Concurrency is one of the features of SpecC which expresses the parallel execution of processes. Describing the systems which contain concurrent behaviors would have some data exchanging or transferring among them. Therefore, the synchronization semantics (notify/wait) of events should be incorporated. The actual design, which is usually sophisticated by its characteristic and functionalities, may contain a bunch of event synchronization codes. This will make the design difficult and time-consuming to verify. In this paper, we introduce a technique which helps verifying the synchronization of events in SpecC. The original SpecC code containing synchronization semantics is parsed and translated into a Boolean SpecC code. The difference decision diagrams (DDDs) is used to verify for event synchronization on Boolean SpecC code. The counter examples for tracing back to the original source are given when the verification results turn out to be unsatisfied. Here we also introduce idea on automatically refinement when the results are unsatisfied and preset some preliminary results.

  • Leakage Power Reduction for Battery-Operated Portable Systems

    Yun CAO  Hiroto YASUURA  

     
    LETTER-Power Optimization

      Page(s):
    3200-3203

    This paper addresses bitwidth optimization focusing on leakage power reduction for system-level low-power design. By means of tuning the design parameter, bitwidth tailored to a given application requirements, the datapath width of processors and size of memories are optimized resulting in significant leakage power reduction besides dynamic power reduction. Experimental results for several real embedded applications, show power reduction without performance penalty range from about 21.5% to 66.2% of leakage power, and 14.5% to 59.2% of dynamic power.

  • Experimental Study on Cell-Base High-Performance Datapath Design

    Masanori HASHIMOTO  Yoshiteru HAYASHI  Hidetoshi ONODERA  

     
    LETTER-IP Design

      Page(s):
    3204-3207

    This paper experimentally investigates the effectiveness of regularly-placed bit-slice layout and transistor-level optimization to datapath circuit performance. We focus on cell-base design flows with transistor-level circuit optimization. We examine the effectiveness through design experiments of 32-bit carry select adder and 16-bit tree-style multiplier in a 0.35 µm technology. From the experimental results, we can scarcely observe that manual cell placement contributes to improve circuit performance. On the other hand, transistor-level circuit optimization is so effective that circuit delay is reduced by 11-20% and power dissipation decreases to 42-62%. We can see that, in the case of cell-base design, transistor-level optimization is also important as well as in the case of custom design, whereas cell-base bit-slice layout has less importance to circuit performance.

  • Evaluation of Delay Testing Based on Path Selection

    Masayasu FUKUNAGA  Seiji KAJIHARA  Sadami TAKEOKA  Shinichi YOSHIMURA  

     
    LETTER-Timing Verification and Test Generation

      Page(s):
    3208-3210

    Since a logic circuit often has too many paths to test delay of all paths, it is necessary for path delay testing to limit the number of paths to be tested. The paths to be tested should have large delay because such paths more likely cause a fault. Additionally, a test set for the paths are required to detect other models of faults as many as possible. In this paper, we investigate two typical criteria of path selection for path delay testing. From our experiments, we observe that test patterns for the longest paths cannot cover many local delay defects such as transition faults.

  • Design of High-Performance Charge-Pump Circuit for PLL Applications

    Chun-Lung HSU  Wu-Hung LU  

     
    LETTER-Analog Design

      Page(s):
    3211-3213

    This work proposed a high-performance charge-pump circuit for phase-locked-loop (PLL) applications. The proposed charge-pump circuit is composed of a pair of wide-swing current mirror and symmetric pump circuits which can provide wide output range and have no jump phenomenon. The proposed charge-pump circuit has been designed and simulated by using the TSMC 0.35 µm 1P4M CMOS technology. Simulation results show the feasibility of proposed structure for low-voltage high-frequency applications.

  • Application of Error Diagnosis Technique to Incremental Synthesis

    Hiroshi INOUE  Takahiro IWASAKI  Toshifumi SUGANE  Masahiro NUMA  Keisuke YAMAMOTO  

     
    LETTER-Design Methodology

      Page(s):
    3214-3217

    In an LSI design process, Engineering Change Orders (ECO's) are often given even after the layout process. This letter presents an approach to change the design to satisfy the new specification with ECO's by employing an error diagnosis technique. Our approach performs incremental synthesis using spare cells embedded on the original layout. Experimental results show that applying the error diagnosis technique to incremental synthesis is effective to suppress increase in delay time caused by ECO's.

  • A Hardware/Software Partitioning Algorithm for Processor Cores with Packed SIMD-Type Instructions

    Nozomu TOGAWA  Koichi TACHIKAKE  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Design Methodology

      Page(s):
    3218-3224

    This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.

  • Design of sfl2vl: SFL to Verilog Converter Based on an LR-Parser

    Naohiko SHIMIZU  

     
    LETTER-Design Methodology

      Page(s):
    3225-3229

    This paper presents the implementation of sfl2vl, a new free tool for SFL to Verilog conversion. Also this paper will discuss the performance of the conversion and the logic simulation of the sfl2vl+Icarus Verilog (free-ware compiler) versus PARTHENON with some MPU designs.

  • Regular Section
  • 3.3 V 35 mW Second-Order Three-Bit Quadrature Band-Pass ΔΣ Modulator for Digital Radio

    Hack-Soo OH  Chang-Gene WOO  Pyung CHOI  Geunbae LIM  Jang-Kyoo SHIN  Jong-Hyun LEE  

     
    PAPER-Analog Signal Processing

      Page(s):
    3230-3239

    Delta-sigma modulators (DSMs) are commonly use in high-resolution analog-to-digital converters, and band-pass delta-sigma modulators have recently been used to convert IF signals into digital signals. In particular, a quadrature band-pass delta-sigma modulator can achieve a lower total order, higher signal-to-noise ratio (SNR), and higher bandwidth when compared with conventional band-pass modulators. The current paper proposes a second-order three-bit quadrature band-pass delta-sigma modulator that can achieve a lower power consumption and better performance with a similar die size to a conventional fourth-order quadrature band-pass delta-sigma modulator (QBPDSM). The proposed system is integrated using CMOS 0.35 µm, double-poly, four-metal technology. The system operates at 13 MHz and can digitize a 200 kHz bandwidth signal centered at 4.875 MHz with an SNR of 85 dB. The power consumption is 35 mW at 3.3 V and 38 mW at 5 V, and the die size is 21.9 mm2.

  • Computation of the Peak of Time Response in the Form of Formal Power Series

    Takuya KITAMOTO  

     
    PAPER-Systems and Control

      Page(s):
    3240-3250

    Suppose that we need to design a controller for the system x(t) = A x(t) + B u, u = -K x(t), y(t) = C x(t), where matrices A, B and C are given and K is the matrix to to determine. It is required to determine K so that y(t) should not exceed prescribed value (i.e., the peak of output y(t) is limited). This kind of specification, in general, difficult to satisfy, since the peak ymax of y(t) (we define ymax to be max0 t |y(t)|) is a non-trivial function of design parameter K, which can not be expressed explicitly generally. Therefore, a controller design with such specifications often requires try and error process. In this paper, we approximate ymax in the form of formal power series and give an efficient algorithm to compute the series. We also give a design example of a control system as an application of the algorithm.

  • Approximability of the Minimum Maximal Matching Problem in Planar Graphs

    Hiroshi NAGAMOCHI  Yukihiro NISHIDA  Toshihide IBARAKI  

     
    PAPER-Graphs and Networks

      Page(s):
    3251-3258

    Given an edge-weighted graph G, the minimum maximal matching problem asks to find a minimum weight maximal matching. The problem is known to be NP-hard even if the graph is planar and unweighted. In this paper, we consider the problem in planar graphs. First, we prove a strong inapproximability for the problem in weighted planar graphs. Second, in contrast with the first result, we show that a polynomial time approximation scheme (PTAS) for the problem in unweighted planar graphs can be obtained by a divide-and-conquer method based on the planar separator theorem. For a given ε > 0, our scheme delivers in time a solution with size at most (1 + ε) times the optimal value, where n is the number of vertices in G and α is a constant number.

  • Constructing c-Secure CRT Codes Using Polynomials over Finite Fields

    Mira KIM  Junji SHIKATA  Hirofumi MURATANI  Hideki IMAI  

     
    PAPER-Information Security

      Page(s):
    3259-3266

    In this paper, we deal with c-secure codes in a fingerprinting scheme, which encode user ID to be embedded into the contents. If a pirate copy appears, c-secure codes allow the owner of the contents to trace the source of the illegal redistribution under collusion attacks. However, when dealing in practical applications, most past proposed codes are failed to obtain a good efficiency, i.e. their codeword length are too large to be embedded into digital contents. In this paper, we propose a construction method of c-secure CRT codes based on polynomials over finite fields and it is shown that the codeword length in our construction is shorter than that of Muratani's scheme. We compare the codeword length of our construction and that of Muratani's scheme by numerical experiments and present some theoretical results which supports the results obtained by numerical experiments. As a result, we show that our construction is especially efficient in respect to a large size of any coalition c. Furthermore, we discuss the influence of the random error on the traceability and formally define the Weak IDs in respect to our construction.

  • Video Watermarking of Which Embedded Information Depends on the Distance between Two Signal Positions

    Minoru KURIBAYASHI  Hatsukazu TANAKA  

     
    PAPER-Image

      Page(s):
    3267-3275

    One of the important topics of watermarking technique is a robustness against geometrical transformations. In the previous schemes, a template matching is performed or an additional signal is embedded for the recovery of a synchronization loss. However, the former requires the original template, and the latter degrades the quality of image because both a watermark and a synchronization signal must be embedded. In the proposed scheme only a synchronization signal is embedded for the recovery of both a watermark and a synchronization loss. Then the embedded information depends on the distance between two embedded signal positions. The distance is not changed seriously by random geometrical transformations like StirMark attack unless the embedded signal is disturbed. Therefore, a watermark can be extracted correctly from such geometrically transformed image if the synchronization signal can be recovered.

  • A Log-Normal Distribution Model for Electron Multiplying Detector Signals in Charged Particle Beam Equipments

    Mitsuru YAMADA  Akinori NISHIHARA  

     
    PAPER-General Fundamentals and Boundaries

      Page(s):
    3276-3282

    We propose a stochastic model for signals generated through the electron multiplying effect of detectors in charged particle beam equipments. This model is based on a stochastic variable characterized by a log-normal type distribution. The model is simple and can be used to represent a wide dynamic range of signals from pulse-like signals when the primary beam current is small to continuous signals when the primary beam current is large. For the model base reference a normalization of actual signal detectors is presented. This base reference yields the unique stochastic parameter used in our model. The proposed model better approximates the actual signals in the power spectrum distribution as compared to the filtered Poisson method presented elsewhere.

  • Measurement of Early Reflections in a Room with Five Microphone System

    Chulmin CHOI  Lae-Hoon KIM  Yangki OH  Sejin DOO  Koeng-Mo SUNG  

     
    LETTER-Engineering Acoustics

      Page(s):
    3283-3287

    The measurement of the 3-dimensional behavior of early reflections in a sound field has been an important issue in auditorium acoustics since the reflection profile has been found to be strongly correlated with the subjective responsiveness of a listener. In order to detect the incidence angle and relative amplitude of reflections, a 4-point microphone system has conventionally been used. A new measurement system is proposed in this paper, which has 5 microphones. Microphones are located on each four apex of a tetrahedron and at the center of gravity. Early reflections, including simultaneously incident reflections,which previous 4-point microphone system could not discriminate as individual wavefronts, were successfully found with the new system. In order to calculate accurate image source positions, it is necessary to determine the exact peak positions from measured impulse responses composed of highly deformed and overlapped impulse trains. For this purpose, a peak-detecting algorithm, which finds dominant peaks in the impulse response by an iteration method, is introduced. In this paper, the theoretical background and features of the 5-microphone system are described. Also, some results of experiments using this system are described.

  • Performance Comparison of Single and Multi-Stage Algebraic Codebooks

    Sung-Kyo JUNG  Hong-Goo KANG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Page(s):
    3288-3290

    This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.

  • Constrained Location Algorithm Using TDOA Measurements

    Hing Cheung SO  Shun Ping HUI  

     
    LETTER-Digital Signal Processing

      Page(s):
    3291-3293

    One conventional technique for source localization is to utilize the time-difference-of-arrival (TDOA) measurements of a signal received at spatially separated sensors. A simple TDOA-based location algorithm that combines the advantages of two efficient positioning methods is developed. It is demonstrated that the proposed approach can give optimum performance in geolocation via satellites at different noise conditions.

  • Wide-Input Range Variable Resistor Circuit Using an FG-MOSFET

    Muneo KUSHIMA  Koichi TANNO  Okihiko ISHIZUKA  

     
    LETTER-Analog Signal Processing

      Page(s):
    3294-3296

    In this letter, a linear variable resistor circuit using an FG-MOSFET (floating-gate MOSFET) is proposed. This is based on Schlarmann's variable resistor and is very simple. The advantage of the proposed circuit is a wide-input range. The utility of the proposed circuit was confirmed by HSPICE simulation with 1.2 µm CMOS process parameters. The simulation results are reported in this letter.

  • An Efficient Method for System-Level Exploration of Global Optimum in a Parameterized ASIP Design

    Yeong-Geol KIM  Tag-Gon KIM  

     
    LETTER-VLSI Design Technology and CAD

      Page(s):
    3297-3302

    This paper proposes an efficient method for design space exploration of the global optimum configuration for parameterized ASIPs. The method not only guarantees the optimum configuration, but also provides robust speedup for a wide range of processor architectures such as SoC, ASIC as well as ASIP. The optimization procedure within this method takes a two-steps approach. Firstly, design parameters are partitioned into clusters of inter-dependent parameters using parameter dependency information. Secondly, parameters are optimized for each cluster, the results of which are merged for global optimum. In such optimization, inferior configurations are extensively pruned with a detailed optimality mapping between dependent parameters. Experimental results with mediabench applications show an optimization speedup of 4.1 times faster than the previous work on average, which is significant improvement for practical use.

  • A Robust Audio Watermarking Scheme Using Wavelet Modulation

    Bing JI  Fei YAN  De ZHANG  

     
    LETTER-Information Security

      Page(s):
    3303-3305

    A novel audio watermarking based on wavelet modulation is presented. The watermark signals are constructed by M-band wavelet modulation that can increase redundancy to improve the detection performance. In order to maximize the watermarking strength within the perceptual constraints, the watermark signals synthesized from different subbands are separately masked using a frequency auditory model. CDMA technique is implemented to achieve watermarking capacity. Experimental results show that this method is very robust.

  • A Call-by-Need Recursive Algorithm for the LogMAP Decoding of a Binary Linear Block Code

    Toshiyuki ISHIDA  Yuichi KAJI  

     
    LETTER-Information Theory

      Page(s):
    3306-3309

    A new algorithm for the LogMAP decoding of linear block codes is considered. The decoding complexity is evaluated analytically and by computer simulation. The proposed algorithm is an improvement of the recursive LogMAP algorithm proposed by the authors. The recursive LogMAP algorithm is more efficient than the BCJR algorithm for low-rate codes, but the complexity grows considerably large for high-rate codes. The aim of the proposed algorithm is to solve the complexity explosion of the recursive LogMAP algorithm for high-rate codes. The proposed algorithm is more efficient than the BCJR algorithm for well-known linear block codes.

  • OFDM-CDMA with Low PAPR Using Cyclic-Shifted Sequence Mapping

    Young-Hwan YOU  Won-Gi JEON  Jeong-Wook SEO  Byoung-Chul SONG  Hyeok-Koo JUNG  

     
    LETTER-Communication Theory and Signals

      Page(s):
    3310-3313

    In this letter, a simple peak-to-average power ratio (PAPR) reduction scheme by using a cyclic-shifted sequence mapping is addressed in OFDM-CDMA systems. The PAPR reduction approach is very simple because of no additional complexity and no side information. Also, this simple approach can be easily combined with a modified selective mapping (SLM) approach, which outperforms the original SLM approach at the expense of one additional side information, guaranteeing approximately same transmitter complexity.

  • A Basic A/D Converter with Trapping Window

    Toshimichi SAITO  Hiroshi IMAMURA  Masaaki NAKA  

     
    LETTER-Neural Networks and Bioengineering

      Page(s):
    3314-3317

    This letter presents a simple A/D converter based on the circle map. The converter encodes a dc input into a binary output sequence and has the trapping window that extracts an available part of the output sequence. Using the available part, the decoder provides an estimation by a fraction with variable denominator: it can realize higher resolution. Theoretical evidences for the estimation characteristics are given.