The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.48

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E89-A No.12  (Publication Date:2006/12/01)

    Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Hidetoshi ONODERA  

     
    FOREWORD

      Page(s):
    3377-3377
  • Memory Size Computation for Real-Time Multimedia Applications Based on Polyhedral Decomposition

    Hongwei ZHU  Ilie I. LUICAN  Florin BALASA  

     
    PAPER-System Level Design

      Page(s):
    3378-3386

    In real-time multimedia processing systems a very large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by the memory modules. In deriving an optimized (for area and/or power) memory architecture, memory size computation is an important step in the exploration of the possible algorithmic specifications of multimedia applications. This paper presents a novel non-scalar approach for computing exactly the memory size in real-time multimedia algorithms. This methodology uses both algebraic techniques specific to the data-flow analysis used in modern compilers and, also, more recent advances in the theory of polyhedra. In contrast with all the previous works which are only estimation methods, this approach performs exact memory computations even for applications significantly large in terms of the code size, number of scalars, and number of array references.

  • Synchronization Verification in System-Level Design with ILP Solvers

    Thanyapat SAKUNKONCHAK  Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER-System Level Design

      Page(s):
    3387-3396

    Concurrency is one of the most important issues in system-level design. Interleaving among parallel processes can cause an extremely large number of different behaviors, making design and verification difficult tasks. In this work, we propose a synchronization verification method for system-level designs described in the SpecC language. Instead of modeling the design with timed FSMs and using a model checker for timed automata (such as UPPAAL or KRONOS), we formulate the timing constraints with equalities/inequalities that can be solved by integer linear programming (ILP) tools. Verification is conducted in two steps. First, similar to other software model checkers, we compute the reachability of an error state in the absence of timing constraints. Then, if a path to an error state exists, its feasibility is checked by using the ILP solver to evaluate the timing constraints along the path. This approach can drastically increase the sizes of the designs that can be verified. Abstraction and abstraction refinement techniques based on the Counterexample-Guided Abstraction Refinement (CEGAR) paradigm are applied.

  • The AMS Extension to System Level Design Language--SpecC

    Yu LIU  Satoshi KOMATSU  Masahiro FUJITA  

     
    PAPER-System Level Design

      Page(s):
    3397-3407

    Recently, system level design languages (SLDLs), which can describe both hardware and software aspects of the design, are receiving attentions. Analog mixed-signal (AMS) extensions to SLDLs enable current discrete-oriented SLDLs to describe and simulate not only digital systems but also digital-analog mixed-signal systems. In this paper, we present our work on the AMS extension to one of the system level design language--SpecC. The extended language supports designer to describe all the analog, digital and software aspects in a universal language.

  • Unified Representation for Speculative Scheduling: Generalized Condition Vector

    Kazutoshi WAKABAYASHI  

     
    PAPER-System Level Design

      Page(s):
    3408-3415

    A unified representation for various kinds of speculations and global scheduling algorithms is presented. After introducing several types of local and global speculations, reviewing our conventional method called conditional vector-based list scheduling, and discussing some of its limitations, we introduce the unique notion of generalized condition vectors (GCVs), which can represent most varieties of speculations and multiple branches as a single vector. The unification of parallel branches and partially unresolved nested conditional branches is discussed. Then, a scheduling algorithm using GCVs is proposed. Experimental results show the effectiveness of the GCV-based scheduling method.

  • An Efficient and Effective Algorithm for Online Task Placement with I/O Communications in Partially Reconfigurable FPGAs

    Mitsuru TOMONO  Masaki NAKANISHI  Shigeru YAMASHITA  Kazuo NAKAJIMA  Katsumasa WATANABE  

     
    PAPER-System Level Design

      Page(s):
    3416-3426

    In a partially reconfigurable FPGA of the future, arbitrary portions of its logic resources and interconnection networks will be reconfigured without affecting the other parts. Multiple tasks will be mapped and executed concurrently in such an FPGA. Efficient execution of the tasks using the limited resources of the FPGA will necessitate effective resource management. A number of online FPGA placement methods have recently been proposed for such an FPGA. However, they cannot handle I/O communications of the tasks. Taking such I/O communications into consideration, we introduce a new approach to online FPGA placement. We present an algorithm for placing each arriving task in an empty area so as to complete all the tasks efficiently. We develop two fitting strategies to effectively handle I/O communications of the tasks. Our experimental results show that properly weighted combinations of these and two other previously proposed strategies enable this algorithm to run very fast and make an effective placement of the tasks. In fact, we show that the overhead associated with the use of this algorithm is negligible as compared to the total execution time of the tasks.

  • Bit-Length Optimization Method for High-Level Synthesis Based on Non-linear Programming Technique

    Nobuhiro DOI  Takashi HORIYAMA  Masaki NAKANISHI  Shinji KIMURA  

     
    PAPER-System Level Design

      Page(s):
    3427-3434

    High-level synthesis is a novel method to generate a RT-level hardware description automatically from a high-level language such as C, and is used at recent digital circuit design. Floating-point to fixed-point conversion with bit-length optimization is one of the key issues for the area and speed optimization in high-level synthesis. However, the conversion task is a rather tedious work for designers. This paper introduces automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically, and formalizes an optimization problem as a non-linear problem. The application of NLP technique improves the balancing between computational accuracy and total hardware cost. Various constraints such as unit sharing, maximum bit-length of function units can be modeled easily, too. Experimental result shows that our method is fast compared with typical one, and reduces the hardware area.

  • Multi-Clock Cycle Paths and Clock Scheduling for Reducing the Area of Pipelined Circuits

    Bakhtiar Affendi ROSDI  Atsushi TAKAHASHI  

     
    PAPER-System Level Design

      Page(s):
    3435-3442

    A new algorithm is proposed to reduce the number of intermediate registers of a pipelined circuit using a combination of multi-clock cycle paths and clock scheduling. The algorithm analyzes the pipelined circuit and determines the intermediate registers that can be removed. An efficient subsidiary algorithm is presented that computes the minimum feasible clock period of a circuit containing multi-clock cycle paths. Experiments with a pipelined adder and multiplier verify that the proposed algorithm can reduce the number of intermediate registers without degrading performance, even when delay variations exist.

  • Efficient Computation of Canonical Form under Variable Permutation and Negation for Boolean Matching in Large Libraries

    Debatosh DEBNATH  Tsutomu SASAO  

     
    PAPER-Logic Synthesis

      Page(s):
    3443-3450

    This paper presents an efficient technique for solving a Boolean matching problem in cell-library binding, where the number of cells in the library is large. As a basis of the Boolean matching, we use the notion NP-representative (NPR): two functions have the same NPR if one can be obtained from the other by a permutation and/or complementation(s) of the variables. By using a table look-up and a tree-based breadth-first search strategy, our method quickly computes the NPR for a given function. Boolean matching of the given function against the whole library is determined by checking the presence of its NPR in a hash table, which stores NPRs for all the library functions and their complements. The effectiveness of our method is demonstrated through experimental results, which show that it is more than two orders of magnitude faster than the Hinsberger-Kolla's algorithm.

  • Coverage Estimation Using Transition Perturbation for Symbolic Model Checking in Hardware Verification

    Xingwen XU  Shinji KIMURA  Kazunari HORIKAWA  Takehiko TSUCHIYA  

     
    PAPER-Simulation and Verification

      Page(s):
    3451-3457

    Lack of complete formal specification is one of the major obstacles to the deployment of model checking. Coverage estimation addresses this issue by revealing the unverified part of the design according to the specified properties. In this paper we propose a new transition-based coverage metric to evaluate the completeness of properties for symbolic model checking. Our coverage metric pinpoints the transitions through which the values of signals are checked. An efficient symbolic algorithm is presented for computing the transition coverage for a subset of ACTL. Our coverage estimator has been applied to the model checking of a cache coherence protocol. We uncovered several coverage holes including one that eventually led to the discovery of a design bug.

  • Hierarchical-Analysis-Based Fast Chip-Scale Power Estimation Method for Large and Complex LSIs

    Yuichi NAKAMURA  Takeshi YOSHIMURA  

     
    PAPER-Simulation and Verification

      Page(s):
    3458-3463

    This paper presents a novel power estimation method for large and complex LSIs. The proposed method is based on simulation and is used for analyzing the ways in chip-scale gate-level circuits including processors and memory are affected by gated-clock power reduction and the voltage drop due to electrical resistance. The chip-scale power estimation based on simulation patterns generally takes enormous time. In order to reduce the time to obtain accurate estimation results based on simulation patterns, we introduce three approaches: "partitioning of target LSIs and simulation pattern," "memory modeling," and "processor modeling." After placing and routing, the target LSIs are partitioned into hierarchical blocks, memory, and processors. The power consumption of each hierarchical block is calculated by using the partitioned patterns generated from chip-scale simulation patterns. The power consumption of the processor and memory blocks is estimated by a method considering the static power consumption and the rate of LSI activity ratio. Experimental results for a commercial 0.18 µm-technology media processing chip show that the proposed method is 23 times faster than the conventional method without partitioning and that both the results are almost the same.

  • Fast FPGA-Emulation-Based Simulation Environment for Custom Processors

    Yuichi NAKAMURA  Kouhei HOSOKAWA  

     
    PAPER-Simulation and Verification

      Page(s):
    3464-3470

    This paper describes a new method for the simulation environment for a custom processor. It is generally very hard to develop an accurate simulator for a custom processor rapidly, even if simple instruction-set-level simulator (ISS). The proposed method uses a field-programmable-gate-array emulator with a PCI interface and debugging GUI software on a PC. Since the emulator implements the processor design at the register-transfer or net-list level, the emulation results are almost the same as the results obtained with the actual processor. To support rich debugging functions like those provided by the conventional software simulator, we use a debugging buffer and break-control circuits. Experimental results show that a simulator constructed by the proposed method can be constructed within several hours and that it can break the processor operation at any specified point and observe the internal signals when the emulated system is running at 1-30 MHz. The accuracy of the constructed simulator is the same as that of RTL simulation and much higher than that of software ISS simulation. We show that we can provide a fast, accurate, and useful simulator for any processor design specified at the register-transfer level.

  • A PC-Based Logic Simulator Using a Look-Up Table Cascade Emulator

    Hiroki NAKAHARA  Tsutomu SASAO  Munehiro MATSUURA  

     
    PAPER-Simulation and Verification

      Page(s):
    3471-3481

    This paper represents a cycle-based logic simulation method using an LUT cascade emulator, where an LUT cascade consists of multiple-output LUTs (cells) connected in series. The LUT cascade emulator is an architecture that emulates LUT cascades. It has a control part, a memory for logic, and registers. It connects the memory to registers through a programmable interconnection circuit, and evaluates the given circuit stored in the memory. The LUT cascade emulator runs on an ordinary PC. This paper also compares the method with a Levelized Compiled Code (LCC) simulator and a simulator using a Quasi-Reduced Multi-valued Decision Diagram (QRMDD). Our simulator is 3.5 to 10.6 times faster than the LCC, and 1.1 to 3.9 times faster than the one using a QRMDD. The simulation setup time is 2.0 to 9.8 times shorter than the LCC. The necessary amount of memory is 1/1.8 to 1/5.5 of the one using a QRMDD.

  • Delay Modeling and Critical-Path Delay Calculation for MTCMOS Circuits

    Naoaki OHKUBO  Kimiyoshi USAMI  

     
    PAPER-Simulation and Verification

      Page(s):
    3482-3490

    One of the critical issues in MTCMOS design is how to estimate a circuit delay quickly. In MTCMOS circuit, voltage on virtual ground fluctuates due to a discharge current of a logic cell. This event affects to the cell delay and makes static timing analysis (STA) difficult. In this paper, we propose a delay modeling and static STA methodology targeting at MTCMOS circuits. In the proposed method, we prepare a delay look-up table (LUT) consisting of the input slew, the output load capacitance, the virtual ground length, and a power-switch size. Using this LUT, we compute a circuit delay for each logic cell by applying the linear interpolation. This technique enables to calculate the cell delay considering the delay increase by the voltage fluctuation of virtual ground line. Experimental results show that the proposed methodology enables to estimate the cell delay and the critical path delay within 8% errors compared with SPICE simulation.

  • On-Chip Thermal Gradient Analysis Considering Interdependence between Leakage Power and Temperature

    Takashi SATO  Junji ICHIMIYA  Nobuto ONO  Masanori HASHIMOTO  

     
    PAPER-Simulation and Verification

      Page(s):
    3491-3499

    In this paper, we propose a methodology for calculating on-chip temperature gradient and leakage power distributions. It considers the interdependence between leakage power and local temperature using a general circuit simulator as a differential equation solver. The proposed methodology can be utilized in the early stages of the design cycle as well as in the final verification phase. Simulation results proved that consideration of the temperature dependence of the leakage power is critically important for achieving reliable physical designs since the conventional temperature analysis that ignores the interdependence underestimates leakage power considerably and may overlook potential thermal runaway.

  • Formal Design of Arithmetic Circuits Based on Arithmetic Description Language

    Naofumi HOMMA  Yuki WATANABE  Takafumi AOKI  Tatsuo HIGUCHI  

     
    PAPER-Circuit Synthesis

      Page(s):
    3500-3509

    This paper presents a formal design of arithmetic circuits using an arithmetic description language called ARITH. The key idea in ARITH is to describe arithmetic algorithms directly with high-level mathematical objects (i.e., number representation systems and arithmetic operations/formulae). Using ARITH, we can provide formal description of arithmetic algorithms including those using unconventional number systems. In addition, the described arithmetic algorithms can be formally verified by equivalence checking with formula manipulations. The verified ARITH descriptions are easily translated into the equivalent HDL descriptions. In this paper, we also present an application of ARITH to an arithmetic module generator, which supports a variety of hardware algorithms for 2-operand adders, multi-operand adders, multipliers, constant-coefficient multipliers and multiply accumulators. The language processing system of ARITH incorporated in the generator verifies the correctness of ARITH descriptions in a formal method. As a result, we can obtain highly-reliable arithmetic modules whose functions are completely verified at the algorithm level.

  • Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method

    Shinobu NAGAYAMA  Tsutomu SASAO  Jon T. BUTLER  

     
    PAPER-Circuit Synthesis

      Page(s):
    3510-3518

    This paper presents an architecture and a synthesis method for compact numerical function generators (NFGs) for trigonometric, logarithmic, square root, reciprocal, and combinations of these functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and approximates the given function by a quadratic polynomial for each segment. Thus, we can implement fast and compact NFGs for a wide range of functions. Experimental results show that: 1) our NFGs require, on average, only 4% of the memory needed by NFGs based on the linear approximation with non-uniform segmentation; 2) our NFG for 2x-1 requires only 22% of the memory needed by the NFG based on a 5th-order approximation with uniform segmentation; and 3) our NFGs achieve about 70% of the throughput of the existing table-based NFGs using only a few percent of the memory. Thus, our NFGs can be implemented with more compact FPGAs than needed for the existing NFGs. Our automatic synthesis system generates such compact NFGs quickly.

  • Design Method of High Performance and Low Power Functional Units Considering Delay Variations

    Kouichi WATANABE  Masashi IMAI  Masaaki KONDO  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-Circuit Synthesis

      Page(s):
    3519-3528

    As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.

  • A Structural Approach for Transistor Circuit Synthesis

    Hiroaki YOSHIDA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Circuit Synthesis

      Page(s):
    3529-3537

    This paper presents a structural approach for synthesizing arbitrary multi-output multi-stage static CMOS circuits at the transistor level, targeting the reduction of transistor counts. To make the problem tractable, the solution space is restricted to the circuit structures which can be obtained by performing algebraic transformations on an arbitrary prime-and-irredundant two-level circuit. The proposed algorithm is guaranteed to find the optimal solution within the solution space. The circuit structures are implicitly enumerated via structural transformations on a single graph structure, then a dynamic-programming based algorithm efficiently finds the minimum solution among them. Experimental results on a benchmark suite targeting standard cell implementations demonstrate the feasibility and effectiveness of the proposed approach. We also demonstrated the efficiency of the proposed algorithm by a numerical analysis on randomly-generated problems.

  • A Sampling Switch Design Procedure for Active Matrix Liquid Crystal Displays

    Shingo TAKAHASHI  Shuji TSUKIYAMA  Masanori HASHIMOTO  Isao SHIRAKAWA  

     
    PAPER-Circuit Synthesis

      Page(s):
    3538-3545

    In the design of an active matrix LCD (Liquid Crystal Display), the ratio of the pixel voltage to the video voltage (RPV) of a pixel is an important factor of the performance of the LCD, since the pixel voltage of each pixel determines its transmitted luminance. Thus, of practical importance is the issue of how to maintain the admissible allowance of RPV of each pixel within a prescribed narrow range. This constraint on RPV is analyzed in terms of circuit parameters associated with the sampling switch and sampling pulse of a column driver in the LCD. With the use of a minimal set of such circuit parameters, a design procedure is described dedicatedly for the sampling switch, which intends to seek an optimal sampling switch as well as an optimal sampling pulse waveform. A number of experimental results show that an optimal sampling switch attained by the proposed procedure yields a source driver with almost 18% less power consumption than the one by manual design. Moreover, the percentage of the RPVs within 1001% among 270 cases of fluctuations is 88.1% for the optimal sampling switch, but 46.7% for the manual design.

  • LSI Design Flow for Shot Reduction of Character Projection Electron Beam Direct Writing Using Combined Cell Stencil

    Taisuke KAZAMA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Physical Design

      Page(s):
    3546-3550

    We propose a shot reduction technique of character projection (CP) Electron Beam Direct Writing (EBDW) using combined cell stencil (CCS) or the advanced process technology. CP EBDW is expected both to reduce mask costs and to realize quick turn around time. One of major issue of the conventional CP EBDW, however, is a throughput of lithography. The throughput is determined by numbers of shots, which are proportional to numbers of cell instances in LSIs. The conventional shot reduction techniques focus on optimization of cell stencil extraction, without any modifications on designed LSI mask patterns. The proposed technique employs the proposed combined cell stencil, with proposed modified design flow, for further shot reduction. We demonstrate 22.4% shot reduction within 4.3% area increase for a microprocessor and 28.6% shot reduction for IWLS benchmarks compared with the conventional technique.

  • Routing of Monotonic Parallel and Orthogonal Netlists for Single-Layer Ball Grid Array Packages

    Yoichi TOMIOKA  Atsushi TAKAHASHI  

     
    PAPER-Physical Design

      Page(s):
    3551-3559

    Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and PCB, but it takes much time in manual routing. So the demand for automation of package routing is increasing. In this paper, we give the necessary and sufficient condition that all nets can be connected by monotonic routes when a net consists of a finger and a ball and fingers are on the two parallel boundaries of the Ball Grid Array package, and propose a monotonic routing method based on this condition. Moreover, we give a necessary condition and a sufficient condition when fingers are on the two orthogonal boundaries, and propose a monotonic routing method based on the necessary condition.

  • Si-Substrate Modeling toward Substrate-Aware Interconnect Resistance and Inductance Extraction in SoC Design

    Toshiki KANAMOTO  Tatsuhiko IKEDA  Akira TSUCHIYA  Hidetoshi ONODERA  Masanori HASHIMOTO  

     
    PAPER-Interconnect

      Page(s):
    3560-3568

    This paper proposes a simple yet sufficient Si-substrate modeling for interconnect resistance and inductance extraction. The proposed modeling expresses Si-substrate as four filaments in a filament-based extractor. Although the number of filaments is small, extracted loop inductances and resistances show accurate frequency dependence resulting from the proximity effect. We experimentally prove the accuracy using FEM (Finite Element Method) based simulations of electromagnetic fields. We also show a method to determine optimal size of the four filaments. The proposed model realizes substrate-aware extraction in SoC design flow.

  • Simple Waveform Model of Inductive Interconnects by Delayed Quadratic Transfer Function with Application to Scaling Trend of Inductive Effects in VLSI's

    Danardono Dwi ANTONO  Kenichi INAGAKI  Hiroshi KAWAGUCHI  Takayasu SAKURAI  

     
    PAPER-Interconnect

      Page(s):
    3569-3578

    A simple analytical model based on Delayed Quadratic (DQ) Transfer Function approximation is proposed for estimating waveforms of inductive single-line interconnects in VLSI's. An expression for overshoot voltage is derived by the model within 17% error for the line width less than 10 times the minimum line width and typical input signal. A delay expression is also proposed within 15% for the same condition. The strength of the inductive effect is shown to be expressed by a closed-form expression, A=2(L(CT+0.5C))1/2/(RT(CT+CJ)+RTC+RCT+0.4RC). By using the criteria, a scaling trend of inductive effects in VLSI's is discussed. It is shown that the inductive effect of single-line, minimum-width VLSI interconnect peaks off at 90 nm based on the ITRS predicted parameters.

  • Statistical Modeling of a Via Distribution for Yield Estimation

    Takumi UEZONO  Kenichi OKADA  Kazuya MASU  

     
    PAPER-Interconnect

      Page(s):
    3579-3584

    In this paper, we propose a via distribution model for yield estimation. This model expresses a relationship between the number of vias and wire length. We also provide an estimate for the total number of vias in a circuit, derived from the via distribution and the wire-length distribution. The via distribution is modeled as a function of track utilization, and the wire-length distribution can be derived from the gate-level netlist and the layout area. We extract model parameters from the commercial chips designed for 0.18-µm and 0.13-µm CMOS processes, and demonstrate the yield degradation caused by vias.

  • Interconnect RL Extraction Based on Transfer Characteristics of Transmission-Line

    Akira TSUCHIYA  Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    PAPER-Interconnect

      Page(s):
    3585-3593

    This paper proposes a method to determine a single frequency for interconnect RL extraction. Resistance and inductance of interconnects depend on frequency, and hence the extraction frequency strongly affects the modeling accuracy of interconnects. The proposed method determines an extraction frequency based on the transfer characteristic of interconnects. By choosing the frequency where the transfer characteristic becomes maximum, the extracted RL values achieve the accurate modeling of the waveform. Experimental results show that the proposed method provides accurate transition waveforms over various interconnect topologies.

  • A VLSI Architecture for Variable Block Size Motion Estimation in H.264/AVC with Low Cost Memory Organization

    Yang SONG  Zhenyu LIU  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3594-3601

    A one-dimensional (1-D) full search variable block size motion estimation (VBSME) architecture is presented in this paper. By properly choosing the partial sum of absolute differences (SAD) registers and scheduling the addition operations, the architecture can be implemented with simple control logic and regular workflow. Moreover, only one single-port SRAM is used to store the search area data. The design is realized in TSMC 0.18 µm 1P6M technology with a hardware cost of 67.6K gates. In typical working conditions (1.8 V, 25), a clock frequency of 266 MHz can be achieved.

  • Power-Efficient LDPC Decoder Architecture Based on Accelerated Message-Passing Schedule

    Kazunori SHIMIZU  Tatsuyuki ISHIKAWA  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3602-3612

    In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (ii) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC decoder based on the accelerated message-passing schedule. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [µm] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.

  • VLSI Implementation of a Modified Efficient SPIHT Encoder

    Win-Bin HUANG  Alvin W. Y. SU  Yau-Hwang KUO  

     
    PAPER-VLSI Architecture

      Page(s):
    3613-3622

    Set Partitioning in Hierarchical Trees (SPIHT) is a highly efficient technique for compressing Discrete Wavelet Transform (DWT) decomposed images. Though its compression efficiency is a little less famous than Embedded Block Coding with Optimized Truncation (EBCOT) adopted by JPEG2000, SPIHT has a straight forward coding procedure and requires no tables. These make SPIHT a more appropriate algorithm for lower cost hardware implementation. In this paper, a modified SPIHT algorithm is presented. The modifications include a simplification of coefficient scanning process, a 1-D addressing method instead of the original 2-D arrangement of wavelet coefficients, and a fixed memory allocation for the data lists instead of a dynamic allocation approach required in the original SPIHT. Although the distortion is slightly increased, it facilitates an extremely fast throughput and easier hardware implementation. The VLSI implementation demonstrates that the proposed design can encode a CIF (352288) 4:2:0 image sequence with at least 30 frames per second at 100-MHz working frequency.

  • A Sub-mW H.264 Baseline-Profile Motion Estimation Processor Core with a VLSI-Oriented Block Partitioning Strategy and SIMD/Systolic-Array Architecture

    Junichi MIYAKOSHI  Yuichiro MURACHI  Tetsuro MATSUNO  Masaki HAMAMOTO  Takahiro IINUMA  Tomokazu ISHIHARA  Hiroshi KAWAGUCHI  Masayuki MIYAMA  Masahiko YOSHIMOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3623-3633

    We propose a sub-mW H.264 baseline-profile motion estimation processor for portable video applications. It features a VLSI-oriented block partitioning strategy and low-power SIMD/systolic-array datapath architecture, where the datapath can be switched between an SIMD and systolic array depending on processing flow. The processor supports all the seven kinds of block modes, and can handle three reference frames for a CIF (352288) 30-fps to QCIF (176144) 15-fps sequences with a quarter-pixel accuracy. It integrates 3.3 million transistors, and occupies 2.83.1 mm2 in a 130-nm CMOS technology. The proposed processor achieves a power of 800 µW in a QCIF 15-fps sequence with one reference picture.

  • A 0.3-V Operating, Vth-Variation-Tolerant SRAM under DVS Environment for Memory-Rich SoC in 90-nm Technology Era and Beyond

    Yasuhiro MORITA  Hidehiro FUJIWARA  Hiroki NOGUCHI  Kentaro KAWAKAMI  Junichi MIYAKOSHI  Shinji MIKAMI  Koji NII  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3634-3641

    We propose a voltage control scheme for 6T SRAM cells that makes a minimum operation voltage down to 0.3 V under DVS environment. A supply voltage to the memory cells and wordline drivers, bitline voltage, and body bias voltage of load pMOSFETs are controlled according to read and write operations, which secures operation margins even at a low operation voltage. A self-aligned timing control with a dummy wordline and its feedback is also introduced to guarantee stable operation in a wide range of the supply voltage. A measurement result of a 64-kb SRAM in a 90-nm process technology shows that a power reduction of 30% can be achieved at 100 MHz. In a 65-nm 64-Mb SRAM, a 74% power saving is expected at 1/6 of the maximum operating frequency. The performance penalty by the proposed scheme is less than 1%, and area overhead is 5.6%.

  • A 50% Power Reduction in H.264/AVC HDTV Video Decoder LSI by Dynamic Voltage Scaling in Elastic Pipeline

    Kentaro KAWAKAMI  Jun TAKEMURA  Mitsuhiko KURODA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER-VLSI Architecture

      Page(s):
    3642-3651

    We propose an elastic pipeline that can apply dynamic voltage scaling (DVS) to hardwired logic circuits. In order to demonstrate its feasibility, a hardwired H.264/AVC HDTV decoder is designed as a real-time application. An entropy decoding process is divided into context-based adaptive binary arithmetic coding (CABAC) and syntax element decoding (SED), which has advantages of smoothing workload for CABAC and keeping efficiency of the elastic pipeline. An operating frequency and supply voltage are dynamically modulated every slot depending on workload of H.264 decoding to minimize power. We optimize the number of slots per frame to enhance power reduction. The proposed decoder achieves a power reduction of 50% in a 90-nm process technology, compared to the conventional clock-gating scheme.

  • Fault Tolerant Dynamic Reconfigurable Device Based on EDAC with Rollback

    Kentaro NAKAHARA  Shin'ichi KOUYAMA  Tomonori IZUMI  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER-VLSI Architecture

      Page(s):
    3652-3658

    Reconfigurable devices are expected to be utilized in such mission-critical fields as space development and undersea cables, because system updates and pseudo-repair can be achieved remotely by reconfiguring. However, conventional reconfigurable devices suffer from memory-bit upset caused by charged particles in space which results in fatal system problems. In this paper, we propose an architecture of a fault-tolerant reconfigurable device. The proposed device is divided into "autonomous-repair cells" with embedded control circuits. The autonomous-repair cell proposed in this paper is based on error detection and correction (EDAC) and uses hardware and time redundancy. From evaluation, it is shown that the proposed architecture achieves sufficient reliability against configuration memory upset. Trade-offs between performance and cost are also analyzed.

  • A Parallel-In Folding Technique for High-Order FIR Filter Implementation

    Lan-Rong DUNG  Hsueh-Chih YANG  

     
    PAPER-VLSI Architecture

      Page(s):
    3659-3665

    This paper presents a hardware-efficient folding technique for high-order FIR filtering while considering the tradeoff between the number of processing elements and throughput rate. Given the throughput rate, one can always employ the minimum number of processing elements for saving the implementation cost and figure out a folded architecture. However, applying inefficient folding techniques may result in costly switches and registers. Therefore, our work intends to evaluate the efficiency for folding techniques in terms of the number of registers, and the power dissipation of registers. As shown in the estimation results, while comparing with the published folded architectures under the same throughput rate, the proposed folding technique can turn out less power dissipation and low hardware complexity than the others. The proposed design has been implemented using TSMC 0.18 µm 1P6M technology. As seen in the post-layout simulation, our design can meet the requirement of IS-95 WCDMA pulse shaping FIR filter while the power consumption can be as low as 16.66 mW.

  • Impact of Intrinsic Parasitic Extraction Errors on Timing and Noise Estimation

    Toshiki KANAMOTO  Shigekiyo AKUTSU  Tamiyo NAKABAYASHI  Takahiro ICHINOMIYA  Koutaro HACHIYA  Atsushi KUROKAWA  Hiroshi ISHIKAWA  Sakae MUROMOTO  Hiroyuki KOBAYASHI  Masanori HASHIMOTO  

     
    LETTER-Interconnect

      Page(s):
    3666-3670

    In this letter, we discuss the impact of intrinsic error in parasitic capacitance extraction programs which are commonly used in today's SoC design flows. Most of the extraction programs use pattern-matching methods which introduces an improvable error factor due to the pattern interpolation, and an intrinsically inescapable error factor from the difference of boundary conditions in the electro-magnetic field solver. Here, we study impact of the intrinsic error on timing and crosstalk noise estimation. We experimentally show that the resulting delay and noise estimation errors show a scatter which is normally distributed. Values of the standard deviations will help designers consider the intrinsic error compared with other variation factors.

  • Regular Section
  • Target-Oriented Acoustic Radiation Generation Technique for Sound Field Control

    Yuan WEN  Jun YANG  Woon-Seng GAN  

     
    PAPER-Engineering Acoustics

      Page(s):
    3671-3677

    A multiple-source system for rendering the sound pressure distribution in a target region can be modeled as a multi-input-multi-output (MIMO) system with the inputs being the source strengths and the outputs being the pressures on multiple measuring points/sensors. In this paper, we propose a target-oriented acoustic radiation generation technique (TARGET) for sound field control. For the MIMO system of a given geometry, a series of basic radiation modes, namely, target-oriented radiation modes (TORMs) can be derived using eigenvector analysis. Different TORMs have different contributions to the system control gain, which is defined as the ratio of the acoustic energy generated in the target zone to the transmitter output power. The TARGET can be effectively applied to the sound reproduction and suppression, which correspond the generations of bright and dark zone respectively. In acoustically bright zone generation and sound beamforming, the highest-gain TORM can be employed to determine the optimal source strengths. In active noise control, the strengths of the secondary sources can be derived using low-gain TORMs. Simulation results show that the proposed method has better or comparable performance than the traditional techniques.

  • Estimation of Color Images by Box Splines from Their Observation through Honeycomb Color Filter

    Tomoko YOKOKAWA  Masaru KAMADA  Yasuhiro OHTAKI  Tatsuhiro YONEKURA  

     
    PAPER-Digital Signal Processing

      Page(s):
    3678-3684

    An experimental test has been done on the suitability of box splines for the estimation of color images from their observation through the honeycomb color filter. The estimation is made by following the framework of consistent sampling by Unser and Aldroubi. Numerical evaluation of the estimation errors has shown that the best estimation may be made by choosing the box splines which are twice locally averaged along each of the three axes consisting of the unilateral triangular mesh. A close look at estimated images has indicated that those box splines are almost free from directional jaggy noises that the traditional B-splines suffer from.

  • Miller Capacitor with Wide Input Range and Its Application to PLL Loop Filter

    Masahiro YOSHIOKA  Nobuo FUJII  

     
    PAPER-Analog Signal Processing

      Page(s):
    3685-3692

    This paper proposes a Miller capacitor which has a wide input signal range. By discharging the charge of the capacitor connected between the input and output terminals of an amplifier before the output voltage of the amplifier exceeds its maximum range, the amplifier always operates in the active region and the Miller operation can be guaranteed. Thus a large value capacitor with a wide dynamic operation range can be realized using a small value capacitor. The Miller capacitor proposed in this paper is applied to a loop filter of phase locked loop (PLL) circuit that requires a large value capacitor to realize a low cutoff frequency. SPICE simulation of the PLL circuit using the Miller capacitor confirms the operation of the Miller capacitor and shows good performances that are similar to those obtained using a passive capacitor of a large value.

  • Necessary and Sufficient Conditions for One-Dimensional Discrete-Time Autonomous Binary Cellular Neural Networks to Be Stable

    Tetsuo NISHI  Norikazu TAKAHASHI  Hajime HARA  

     
    PAPER-Nonlinear Problems

      Page(s):
    3693-3698

    We give the necessary and sufficient conditions for a one-dimensional discrete-time autonomous binary cellular neural networks to be stable in the case of fixed boundary. The results are complete generalization of our previous one [16] in which the symmetrical connections were assumed. The conditions are compared with some stability conditions so far known.

  • On the Expected Prediction Error of Orthogonal Regression with Variable Components

    Katsuyuki HAGIWARA  Hiroshi ISHITANI  

     
    PAPER-Algorithms and Data Structures

      Page(s):
    3699-3709

    In this article, we considered the asymptotic expectations of the prediction error and the fitting error of a regression model, in which the component functions are chosen from a finite set of orthogonal functions. Under the least squares estimation, we showed that the asymptotic bias in estimating the prediction error based on the fitting error includes the true number of components, which is essentially unknown in practical applications. On the other hand, under a suitable shrinkage method, we showed that an asymptotically unbiased estimate of the prediction error is given by the fitting error plus a known term except the noise variance.

  • Properties of a Word-Valued Source with a Non-prefix-free Word Set

    Takashi ISHIDA  Masayuki GOTO  Toshiyasu MATSUSHIMA  Shigeichi HIRASAWA  

     
    PAPER-Information Theory

      Page(s):
    3710-3723

    Recently, a word-valued source has been proposed as a new class of information source models. A word-valued source is regarded as a source with a probability distribution over a word set. Although a word-valued source is a nonstationary source in general, it has been proved that an entropy rate of the source exists and the Asymptotic Equipartition Property (AEP) holds when the word set of the source is prefix-free. However, when the word set is not prefix-free (non-prefix-free), only an upper bound on the entropy density rate for an i.i.d. word-valued source has been derived so far. In this paper, we newly derive a lower bound on the entropy density rate for an i.i.d. word-valued source with a finite non-prefix-free word set. Then some numerical examples are given in order to investigate the behavior of the bounds.

  • Steady-State Properties of a CORDIC-Based Adaptive ARMA Lattice Filter

    Shin'ichi SHIRAISHI  Miki HASEYAMA  Hideo KITAJIMA  

     
    LETTER-Digital Signal Processing

      Page(s):
    3724-3729

    This paper analyzes the steady-state properties of a CORDIC-based adaptive ARMA lattice filter. In our previous study, the convergence properties of the filter in the non-steady state were clarified; however, its behavior in the steady state was not discussed. Therefore, we develop a distinct analysis technique based on a Markov chain in order to investigate the steady-state properties of the filter. By using the proposed technique, the relationship between step size and coefficient estimation error is revealed.

  • Grounded-Capacitor First-Order Filter Using Minimum Components

    Hua-Pin CHEN  Kuo-Hsiung WU  

     
    LETTER-Circuit Theory

      Page(s):
    3730-3731

    Despite the extensive literature on current conveyor-based voltage-mode first-order all-pass filters, no filter circuits have been reported to date that simultaneously achieve all of the advantageous features: (i) the employment of only one current conveyor, (ii) the employment of only one grounded capacitor, (iii) the employment of only one resistor, (iv) no need to impose component choice conditions, and (v) low active and passive sensitivities. In this letter, we describe such a filter structure with all of the above features simultaneously present, without trade-offs. H-Spice simulation results using the TSMC025 process and 1.25 V supply voltages validate the theoretical predictions.

  • New Digital Fingerprint Code Construction Scheme Using Group-Divisible Design

    InKoo KANG  Kishore SINHA  Heung-Kyu LEE  

     
    LETTER-Information Security

      Page(s):
    3732-3735

    Combinatorial designs have been used to construct digital fingerprint codes. Here, a new constructive algorithm for an anticollusion fingerprint code based on group-divisible designs is presented. These codes are easy to construct and available for a large number of individuals, which is important from a business point of view. Group-divisible designs have not been used previously as a tool for fingerprint code construction.

  • First Derivatives Estimation of Nonlinear Parameters in Hybrid System

    Jung-Wook PARK  Byoung-Kon CHOI  Kyung-Bin SONG  

     
    LETTER-Concurrent Systems

      Page(s):
    3736-3738

    This letter describes the first derivatives estimation of nonlinear parameters through an embedded identifier in the hybrid system by using a feed-forward neural network (FFNN). The hybrid systems are modelled by the differential-algebraic-impulsive-switched (DAIS) structure. The FFNN is used to identify the full dynamics of the hybrid system. Moreover, the partial derivatives of an objective function J with respect to the parameters are estimated by the proposed identifier. Then, it is applied for the identification and estimation of the non-smooth nonlinear dynamic behaviors due to a saturation limiter in a practical engineering system.