The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.48

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E83-A No.12  (Publication Date:2000/12/25)

    Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Yukihiro NAKAMURA  Takashi KAMBE  

     
    FOREWORD

      Page(s):
    2399-2399
  • Architecture and Performance Evaluation of a New Functional Memory: Functional Memory for Addition

    Kazutoshi KOBAYASHI  Masanao YAMAOKA  Yukifumi KOBAYASHI  Hidetoshi ONODERA  Keikichi TAMARU  

     
    PAPER-VLSI Architecture

      Page(s):
    2400-2408

    We propose a functional memory for addition (FMA), which is a memory-merged logic LSI. It is a memory as well as a SIMD parallel processor. To minimize the area, a precessing element (PE) consists of several DRAM words and a bit-serial ALU. The ALU has a functionality of addition bit by bit. This paper describes two FMA experimental LSIs. One is for general purpose, and the other is for full search block matching of image compression. We estimate that a 0.18 µm process realizes 57,000 PEs in a 50 mm2 die, achieving 205 GOPS under 1.36 W power.

  • Programmable Dataflow Computing on PCA

    Norbert IMLIG  Tsunemichi SHIOZAWA  Ryusuke KONISHI  Kiyoshi OGURI  Kouichi NAGAMI  Hideyuki ITO  Minoru INAMORI  Hiroshi NAKADA  

     
    PAPER-VLSI Architecture

      Page(s):
    2409-2416

    This paper introduces a flexible, stream-oriented dataflow processing model based on the "Communicating Logic (CL)" framework. As the target architecture, we adopt the dual layered "Plastic Cell Architecture (PCA). " Datapath processing functionality is encapsulated in asynchronous hardware objects with variable graining and implemented using look-up tables. Communication (i.e. connectivity and control) between the distributed processing objects is achieved by means of inter-object message passing. The key point of the CL approach is that it offers the merits of scalable performance, low power hardware implementation with the user friendly compilation and linking capabilities unique to software.

  • Dynamic Fast Issue (DFI) Mechanism for Dynamic Scheduled Processors

    Abderazek BEN ABDALLAH  Mudar SAREM  Masahiro SOWA  

     
    PAPER-VLSI Architecture

      Page(s):
    2417-2425

    Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.

  • A New Algorithm for the Configuration of Fast Adder Trees

    Alberto PALACIOS-PAWLOVSKY  

     
    PAPER-VLSI Architecture

      Page(s):
    2426-2430

    This paper describes a new algorithm for configuring the array of adders used to add the partial products in a multiplier circuit. The new algorithm reduces not only the number of half adders in an adder tree, but also the number of operands passed to the block generating the final product in a multiplier. The arrays obtained with this algorithm are smaller than Wallace's ones and have fewer outputs than Dadda's arrays. We show some evaluation figures and preliminary simulation results of 4, 8 and 16-bit tree configurations.

  • A New Method for Constructing IP Level Power Model Based on Power Sensitivity

    Heng-Liang HUANG  Jiing-Yuan LIN  Wen-Zen SHEN  Jing-Yang JOU  

     
    PAPER-VLSI Design Methodology

      Page(s):
    2431-2438

    As the function of a system getting more complex, IP (Intellectual Property) reusing is the trend of system design style. Designers need to evaluate the performance and features of every candidate IP block that can be used in their design, while IP providers hope to keep the structure of their IP blocks a secret. An IP level power model is a model that takes only the primary input statistics as parameters and does not reveal any information about the sizes of the transistors or the structure of the circuit. This paper proposes a new method for constructing power model that is suitable for IP level circuit blocks. It is a nominal point selection method for power models based on power sensitivities. By analyzing the relationship between the dynamic power consumption of CMOS circuits and their input signal statistics, a guideline of selecting the nominal point is proposed. From our analysis, the first nominal point is selected to minimize the average estimation error and two other nominal points are selected to minimize the maximum estimation error. Our experimental results on a number of benchmark circuits show the effectiveness of the proposed method. Average estimation accuracy within 5.78% of transistor level simulations is achieved. The proposed method can be applied to build a system level power estimation environment without revealing the contents of the IP blocks inside. Thereby, it is a promising method for IP level power model construction.

  • A Practical Method for System-Level Bus Architecture Validation

    Kazuyoshi TAKEMURA  Masanobu MIZUNO  Akira MOTOHARA  

     
    PAPER-VLSI Design Methodology

      Page(s):
    2439-2445

    This paper presents a system-level bus architecture validation technique and shows its application to a consumer product design. This technique enables the entire system to be validated with bus cycle accuracy using bus architecture level models derived from their corresponding behavioral level models. Experimental results from a digital still camera (DSC) system design show that our approach offers much faster simulation speed than register transfer level (RTL) simulators. Using this fast and accurate validation technique, bus architecture designs, validations and optimizations can be effectively carried out at system-level and total turn around time of system designs can be reduced dramatically.

  • A Specification Style of Four-Phase Handshaking Asynchronous Controllers and the Optimization of Its Return-to-Zero Phase

    Rafael K. MORIZAWA  Takashi NANYA  

     
    PAPER-VLSI Design Methodology

      Page(s):
    2446-2455

    A known problem of the four-phase handshaking protocol is that a return-to-zero phase of the signals involved in the handshake is necessary before starting another cycle, in which no useful work is usually done. In this paper we first define an easy-to-write specification style to specify four-phase handshaking asynchronous controllers that can be translated to an STG to obtain a gate-level implementation using existing synthesis methods. Then, we propose an algorithm that takes the specification written using our specification style and finds an optimized timing in which the idle-phase overhead of its gate-level implementation is reduced.

  • Thread Composition Method for Hardware Compiler Bach Maximizing Resource Sharing among Processes

    Mizuki TAKAHASHI  Nagisa ISHIURA  Akihisa YAMADA  Takashi KAMBE  

     
    PAPER-Co-design and High-level Synthesis

      Page(s):
    2456-2463

    This paper presents a method of thread composition in a hardware compiler Bach. Bach synthesizes RT level circuits from a system description written in Bach-C language, where a system is modeled as communicating processes running in parallel. The system description is decomposed into threads, i.e., strings of sequential processes, by grouping processes which are not executed in parallel. The set of threads are then converted into behavioral VHDL models and passed to a behavioral synthesizer. The proposed method attempts to find a thread configuration that maximize resource sharing among processes in the threads. Experiments on two real designs show that the circuit sizes were reduced by 3.7% and 14.7%. We also show the detailed statistics and analysis of the size of the resulting gate level circuits.

  • CAM Processor Synthesis Based on Behavioral Descriptions

    Nozomu TOGAWA  Tatsuhiko WAKUI  Tatsuhiko YODEN  Makoto TERAJIMA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Co-design and High-level Synthesis

      Page(s):
    2464-2473

    CAM (Content Addressable Memory) units are generally designed so that they can be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.

  • Multicriteria Codesign Optimization for Embedded Multimedia Communication System

    I-Horng JENG  Feipei LAI  

     
    PAPER-Co-design and High-level Synthesis

      Page(s):
    2474-2487

    In the beginning of the new century, many information appliance (IA) products will replace traditional electronic appliances to help people in smart, efficient, and low-cost ways. These successful products must be capable of communicating multimedia information, which is embedded into the electronic appliances with high integration, innovation, and power-throughput tradeoff. In this paper, we develop a codesign procedure to analyze, compare, and emulate the multimedia communication applications to find the candidate implementations under different criteria. The experimental results demonstrate that in general, memory technology dominates the optimal tradeoff and ALU improvements impact greatly on particular applications. The results also show that the proposed procedure is effective and quite efficient.

  • Intrinsic Evolution for Synthesis of Fault-Recoverable Circuit

    Tae-Suh PARK  Chong-Ho LEE  Duck-Jin CHUNG  

     
    PAPER-Co-design and High-level Synthesis

      Page(s):
    2488-2497

    This paper presents an evolutionary technique to build and maintain fault-recoverable digital circuits. As the synthesis of a circuit by genetic algorithm is progressed according to the circuit behavioral objectives and interactions with the environments, the knowledge regarding the architecture as well as the placement and routing processes is not the major concern of the proposed method. The evolutionary behavior of the circuit also prevents the circuit from stuck-at faults by continuously modifying the neighboring circuit blocks accordingly. This is done without the prior knowledge of where and how the faults occur because of the evolutionary nature. Thus, the overhead circuit blocks for fault diagnosis and redundancy are minimized with this design. The fault-recoverable evolvable hardware circuits are synthesized to build a few combinational logics by evolution and the fault recovery capabilities are shown with the reconfigurable FPGA.

  • Heuristics to Minimize Multiple-Valued Decision Diagrams

    Hafiz Md. HASAN BABU  Tsutomu SASAO  

     
    PAPER-Logic Synthesis

      Page(s):
    2498-2504

    In this paper, we propose a method to minimize multiple-valued decision diagrams (MDDs) for multiple-output functions. We consider the following: (1) a heuristic for encoding the 2-valued inputs; and (2) a heuristic for ordering the multiple-valued input variables based on sampling, where each sample is a group of outputs. We first generate a 4-valued input 2-valued multiple-output function from the given 2-valued input 2-valued functions. Then, we construct an MDD for each sample and find a good variable ordering. Finally, we generate a variable ordering from the orderings of MDDs representing the samples, and minimize the entire MDDs. Experimental results show that the proposed method is much faster, and for many benchmark functions, it produces MDDs with fewer nodes than sifting. Especially, the proposed method generates much smaller MDDs in a short time for benchmark functions when several 2-valued input variables are grouped to form multiple-valued variables.

  • An Algorithm for Generating Generic BDDs

    Tetsushi KATAYAMA  Hiroyuki OCHI  Takao TSUDA  

     
    PAPER-Logic Synthesis

      Page(s):
    2505-2512

    Binary Decision Diagrams (BDDs) are graph representation of Boolean functions. In particular, Ordered BDDs (OBDDs) are useful in many situations, because they provide canonical representation and they are manipulated efficiently. BDD packages which automatically generate OBDDs have been developed, and they are now widely used in logic design area, including formal verification and logic synthesis. Synthesis of pass-transistor circuits is one of successful applications of such BDD packages. Pass-transistor circuits are generated from BDDs by mapping each node to a selector which consists of two or four pass transistors. If circuits are generated from smaller BDDs, generated circuits have smaller number of transistors and hence save chip area and power consumption. In this paper, more generic BDDs which have no restrictions in variable ordering and variable appearance count on its paths are called Generic BDDs (GBDDs), and an algorithm for generating GBDDs is proposed for the purpose of synthesis of pass-transistor circuits. The proposed algorithm consists of two steps. At the first step, parse trees (PTs) for given Boolean formulas are generated, where a PT is a directed tree representation of Boolean formula(s) and it consists of literal nodes and operation nodes. In this step, our algorithm attempts to reduce the number of literal nodes of PTs. At the second step, a GBDD is generated for the PTs using Concatenation Method, where Concatenation Method generates a GBDD by connecting GBDDs vertically. In this step, our algorithm attempts to share isomorphic subgraphs. In experiments on ISCAS'89 and MCNC benchmark circuits, our program successfully generated 32 GBDDs out of 680 single-output functions and 4 GBDDs out of 49 multi-output functions whose sizes are smaller than OBDDs. GBDD size is reduced by 23.1% in the best case compared with OBDD.

  • Efficient Kernel Generation Based on Implicit Cube Set Representations and Its Applications

    Hiroshi SAWADA  Shigeru YAMASHITA  Akira NAGOYA  

     
    PAPER-Logic Synthesis

      Page(s):
    2513-2519

    This paper presents a new method that efficiently generates all of the kernels of a sum-of-products expression. Its main feature is the memorization of the kernel generation process by using a graph structure and implicit cube set representations. We also show its applications for common logic extraction. Our extraction method produces smaller circuits through several extensions than the extraction method based on two-cube divisors known as best ever.

  • Robust Heuristics for Multi-Level Logic Simplification Considering Local Circuit Structure

    Qiang ZHU  Yusuke MATSUNAGA  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER-Logic Synthesis

      Page(s):
    2520-2527

    Combinational logic circuits are usually implemented as multi-level networks of logic nodes. Multi-level logic simplification using the don't cares on each node is widely used. Large don't cares give good simplification results, but suffer from huge memory area and computation time. Extraction of useful don't cares and reduction of the size of the don't cares are important problems on the simplification using don't cares. In the paper, we propose a new robust heuristic method for the selection of don't cares. We consider an adaptive subnetwork for each simplified node in the network and introduce a stepwise enhancement method of the subnetwork considering the memory area and the network structure. The don't cares extracted from the adaptive subnetworks are called the local don't cares. We have implemented our method for satisfiability don't cares and observability don't cares. We have applied the method on MCNC89 benchmarks, and compared the experimental results with those of the SIS system. The results demonstrate the superiority of our method on the quality of the results and on the size of applicable circuits.

  • Synthesis of Minimum-Cost Multilevel Logic Networks via Genetic Algorithm

    Barry SHACKLEFORD  Etsuko OKUSHI  Mitsuhiro YASUDA  Hisao KOIZUMI  Katsuhiko SEO  Hiroto YASUURA  

     
    PAPER-Logic Synthesis

      Page(s):
    2528-2537

    The problem of synthesizing a minimum-cost logic network is formulated for a genetic algorithm (GA). When benchmarked against a commercial logic synthesis tool, an odd parity circuit required 24 basic cells (BCs) versus 28 BCs for the design produced by the commercial system. A magnitude comparator required 20 BCs versus 21 BCs for the commercial system's design. Poor temporal performance, however, is the main disadvantage of the GA-based approach. The design of a hardware-based cost function that would accelerate the GA by several thousand times is described.

  • Array-Based Mapping Algorithm of Logic Functions into Plastic Cell Architecture

    Tomonori IZUMI  Ryuji KAN  Yukihiro NAKAMURA  

     
    PAPER-Logic Synthesis

      Page(s):
    2538-2544

    Recently, Plastic Cell Architecture (PCA) has been proposed as a hard-wired general-purpose autonomously reconfigurable processor. PCA consists of two layers, the plastic part on which sequential logic circuits are implemented, and the built-in part which induces the plastic part to dynamically reconfigure the circuits and transports messages among the circuits. The plastic part consists of an array of LUT-based reconfigurable logic primitives, each of which is connected only to adjacent ones. Combining logic and layout synthesis, we propose a new array-based algorithm to map logic functions into the PCA plastic part. This algorithm produces a folded array of sum-of-multi-input-complex-terms, especially for the PCA plastic part.

  • Delay-Optimal Technology Mapping for Hard-Wired Non-Homogeneous FPGAs

    Hsien-Ho CHUANG  Jing-Yang JOU  C. Bernard SHUNG  

     
    PAPER-Performance Optimization

      Page(s):
    2545-2551

    A delay-optimal technology mapping algorithm is developed on a general model of FPGA with hard-wired non-homogeneous logic block architectures which is composed of different sizes of look-up tables (LUTs) hard-wired together. This architecture has the advantages of short delay of hard-wired connections and area-efficiency of non-homogeneous structure. The Xilinx XC4000 is one commercial example, where two 4-LUTs are hard-wired to one 3-LUT. In this paper, we present a two-dimensional labeling approach and a level-2 node cut algorithm to handle the hard-wired feature. The experimental results show that our algorithm generates favorable results for Xilinx XC4000 CLBs. Over a set of MCNC benchmarks, our algorithm produces results with 17% fewer CLB depth than that of FlowMap in similar CPU time on average, and with 4% fewer CLB depth than that of PDDMAP on average while PDDMAP needs 15 times more CPU time.

  • Clock Schedule Design for Minimum Realization Cost

    Tomoyuki YODA  Atsushi TAKAHASHI  

     
    PAPER-Performance Optimization

      Page(s):
    2552-2557

    A semi-synchronous circuit is a circuit in which the clock is assumed to be distributed periodically to each individual register, though not necessarily to all registers simultaneously. In this paper, we propose an algorithm to achieve the target clock period by modifying a given target clock schedule as small as possible, where the realization cost of the target clock schedule is assumed to be minimum. The proposed algorithm iteratively improves a feasible clock schedule. The algorithm finds a set of registers that can reduce the cost by changing their clock timings with same amount, and changes the clock timing with optimal amount. Experiments show that the algorithm achieves the target clock period with fewer modifications.

  • A Performance Optimization Method by Gate Resizing Based on Statistical Static Timing Analysis

    Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    PAPER-Performance Optimization

      Page(s):
    2558-2568

    This paper discusses a gate resizing method for performance enhancement based on statistical static timing analysis. The proposed method focuses on timing uncertainties caused by local random fluctuation. Our method aims to remove both over-design and under-design of a circuit, and realize high-performance and high-reliability LSI design. The effectiveness of our method is examined by 6 benchmark circuits. We verify that our method can reduce the delay time further from the circuits optimized for minimizing the delay without the consideration of delay fluctuation.

  • An Iterative Improvement Circuit Partitioning Algorithm under Path Delay Constraints

    Jun'ichiro MINAMI  Tetsushi KOIDE  Shin'ichi WAKABAYASHI  

     
    PAPER-Layout Synthesis

      Page(s):
    2569-2576

    This paper presents a timing-driven iterative improvement circuit partitioning algorithm under path delay constraints for the general delay model. The proposed algorithm is an extension of the Fiduccia & Mattheyses (FM) method so as to handle path delay constraints and consists of the clustering and iterative improvement phases. In the first phase, we reduce the size of a given circuit, with a new clustering algorithm to obtain a partition in a short computation time. Next, the iterative improvement phase based on the FM method is applied, and then a new path-based timing violation removal algorithm is also performed so as to remove all the timing violations. From experimental results for ISCAS89 benchmarks, we have demonstrated that the proposed algorithm can produce the partitions which mostly satisfy the timing constraints.

  • A Cell Synthesis Method for Salicide Process Using Assignment Graph

    Kazuhisa OKADA  Takayuki YAMANOUCHI  Takashi KAMBE  

     
    PAPER-Layout Synthesis

      Page(s):
    2577-2583

    In this paper, we propose a cell synthesis method for a Salicide process. Our method utilizes the local interconnect between adjacent transistors, which is available in some Salicide processes, and optimizes the transistor placement of a cell considering both area and the number of local interconnects. In this way we reduce the number of metal wires and contacts. The circuit model is not restricted to conventional series-parallel CMOS logic, and our method enables us to synthesize CMOS pass-transistor circuits. Experimental results show that our method uses the local interconnect effectively, and optimizes both cell area and metal wire length.

  • WSSA: A High Performance Simulated Annealing and Its Application to Transistor Placement

    Shunji SAIKA  Masahiro FUKUI  Masahiko TOYONAGA  Toshiro AKINO  

     
    PAPER-Layout Synthesis

      Page(s):
    2584-2591

    Another high performance simulated annealing is proposed which we call widely stepping simulated annealing (WSSA). It flies from a starting high temperature to a finishing low temperature staying at only twenty or so temperatures to approach thermal equilibriums. We survey the phase transition in simulated annealing process and estimate the major cost variation (dEc) at the critical temperature. The WSSA uses a function (H(t)) that represents the probability for a hill-climbing with the dEc of cost increase to be accepted in Metropolis' Monte Carlo simulation at temperature t. We have applied the first version of WSSA to one dimensional transistor placement optimizations for several industrial standard cells, and compared its performance with simulated annealing with a geometrically scheduled cooling. The solutions by the WSSA are as good as, and sometimes much better than, the solutions by the simulated annealing, while the time consumption by the WSSA is properly under one 30th of that by the simulated annealing.

  • A Method for Linking Process-Level Variability to System Performances

    Tomohiro FUJITA  Hidetoshi ONODERA  

     
    PAPER-Simulation

      Page(s):
    2592-2599

    In this paper we present a case study of a hierarchical statistical analysis. The method which we use here bridges the statistical information between process-level and system-level, and enables us to know the effect of the process variation on the system performance. We use two modeling techniques--intermediate model and response surface model--in order to link the statistical information between adjacent design levels. We show an experiment of the hierarchical statistical analysis applied to a Phase Locked Loop (PLL) circuit, and indicate that the hierarchical statistical analysis is practical with respect to both accuracy and simulation cost. Following three applications are also presented in order to show advantage of this linking method; these are Monte Carlo analysis, worst-case analysis, and sensitive analysis. The results of the Monte Carlo and the worst-case analysis indicate that this method is realistic statistical one. The result of the sensitive analysis enables us to evaluate the effect of process variation at the system level. Also, we can derive constraints on the process variation from a performance requirement.

  • Multi-Cycle Path Detection Based on Propositional Satisfiability with CNF Simplification Using Adaptive Variable Insertion

    Kazuhiro NAKAMURA  Shinji MARUOKA  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER-Test

      Page(s):
    2600-2607

    Multi-cycle paths are paths between registers where 2 or more clock cycles are allowed to propagate signals, and the detection of multi-cycle paths is important in deciding proper clock period, timing verification and logic optimization. This paper presents a satisfiability-based multi-cycle path detection method, where the detection problems are reduced to CNF formulae and the satisfiability is checked using SAT provers. We also show heuristics on conversion from multi-level circuits into CNF formulae. We have applied our method to ISCAS'89 benchmarks and other sample circuits. Experimental results show the remarkable improvements on the size of manipulatable circuits.

  • Testability Analysis of Analog Circuits via Determinant Decision Diagrams

    Tao PI  Chuan-Jin Richard SHI  

     
    PAPER-Test

      Page(s):
    2608-2615

    The use of the column-rank of the system sensitivity matrix as a testability measure for parametric faults in linear analog circuits was pioneered by Sen and Saeks in 1970s, and later re-introduced by several others. Its practical use has been limited by how it can be calculated. Numerical algorithms suffer from inevitable round-off errors, while traditional symbolic techniques can only handle very small circuits. In this paper, an efficient method is introduced for the analysis of Sen and Saeks' analog testability. The method employs determinant decision diagram based symbolic circuit analysis. Experimental results have demonstrated the new method is capable of handling much larger analog circuits.

  • High-Speed Wide-Locking Range VCO with Frequency Calibration

    Takeo YASUDA  

     
    PAPER-Analog Circuit Design

      Page(s):
    2616-2622

    High-speed systems require a wide-frequency-range clock system for data processing. Phase-locked loop (PLL) is used for such a system that requires wide-range variable frequency clock. Frequency calibration method enables the voltage-controlled oscillator (VCO) in a PLL to cover the expected frequency range for high-speed applications that require a wide locking range. Frequency range adjustment is implemented by means of a current digital to analog converter (DAC), which controls the performance curves of a VCO and a bias circuit. This method adjusts the VCO's frequency-voltage performance curves before functional operation so that a PLL can cover requested frequency range with its best condition. Both the limit of control voltage and its target reference voltage are given with same voltage reference. This ensures correct performance after frequency adjustment even under the temperature fluctuation. It eliminates post-production physical adjustment such as fuse trimming which increases the cost and TAT in manufacturing and testing. A high-speed wide-locking range VCO with an automatic frequency performance calibration circuit is implemented within small space in a high-speed hard disk drive channel with 0.25-µm 2.5 V CMOS four-layer metal technology.

  • A 3.3 V CMOS PLL with a Self-Feedback VCO

    Yeon Kug MOON  Kwang Sub YOON  

     
    LETTER-Analog Circuit Design

      Page(s):
    2623-2626

    A 3.3 V CMOS PLL (Phase Locked loop) with a self-feedback VCO (Voltage Controlled Oscillator) is designed for a high frequency, low voltage, and low power applications. This paper proposes a new PLL architecture to improve voltage-to-frequency linearity of VCO with a new delay cell. The proposed VCO with a self-feedback path operates at a wide frequency range of 30 MHz-1 GHz with a good linearity. The DC-DC Voltage Up/Down Converter is newly designed to regulate the control voltage of the two-stage VCO. The designed PLL architecture is implemented on a 0.6 µm n-well CMOS process. The simulation results illustrate a locking time of 2.6 µsec at 1 GHz, lock in range of 100 MHz-1 GHz, and a power dissipation of 112 mW.

  • Design of a Low Power Consumption Pulse-Shaping 1:4 Interpolation FIR Filter for W-CDMA Applications

    Keun-Jang RYOO  Jong-Wha CHONG  

     
    LETTER-Analog Circuit Design

      Page(s):
    2627-2630

    This paper presents the design and simulation of a power efficient 1:4 interpolation FIR filter with partitioned look Up Table (LUT) structure. Using the symmetry of the filter coefficients and the contents of the LUT, the area of the proposed filter is minimized. The two filters share the partitioned LUT and activate the LUT selectively to realize the low power operation. Experimental results suggest that the proposed filter reduces the power consumption by 25% and simultaneously reduces the gate area by 7% compared to the previously proposed single-architecture dual-channel filter.

  • High Level Analysis of Clock Regions in a C++ System Description

    Luc RYNDERS  Patrick SCHAUMONT  Serge VERNALDE  Ivo BOLSENS  

     
    LETTER-High-level Synthesis

      Page(s):
    2631-2632

    Timing verification of digital synchronous designs is a complex process that is traditionally carried out deep in the design cycle, at the gate level. A method, embodied in a C++ based design system, is presented that allows modeling and verification of clock regions at a higher level. By combining event-driven, clock-cycle true and behavioral simulation, we are able to perform static and dynamic timing analysis of the clock regions.

  • Regular Section
  • Subjective Assessment of the Desired Echo Return Loss for Subband Acoustic Echo Cancellers

    Sumitaka SAKAUCHI  Yoichi HANEDA  Shoji MAKINO  Masashi TANAKA  Yutaka KANEDA  

     
    PAPER-Engineering Acoustics

      Page(s):
    2633-2639

    We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment. The desired echo return loss as a function of frequency (DERLf) is an important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered an acceptable echo caused by electrical loss in the transmission line. The DERLf during single-talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners; (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms), the echo component around 1 kHz was the most objectionable, even if the transmission delay was short; and (4) the DERLf during double-talk was about 5 to 10 dB lower than that during single-talk. Use of these DERLf values will enable the design of more efficient subband echo cancellers.

  • A Basic Study of Cough Signal Detection for a Life-Support System

    Shoichi TAKEDA  Shuichi KATO  Koki TORIUMI  

     
    PAPER-Digital Signal Processing

      Page(s):
    2640-2648

    Aged people who live alone are in particular need of a daily health check, medication, and of warm communication with family and friends. The authors have been developing a life-support computer system with such functions. Among them, a daily health check function with the capability of measuring blood pressure, detecting diseases from coughing, and so on would in particular be very powerful for primary care. As a first step to achieving quick services for a daily health check with a personal computer, utilization of cough information is considered. Features of cough data are analyzed aiming at developing an automatic cough data detection method. This paper proposes a novel method for extracting cough signals from other types of signals. The differential coefficient of a low-pass filtered waveform is first shown to be an effective parameter for discriminating between vowel and cough signals, and the relationship between cut-off frequency and cough detection rate is clarified. This parameter is then applied to cough signals mixed with vowel signals or white noises to evaluate robustness. The evaluation tests show that the cough feature can be perfectly detected for a 20 dB S/N ratio when the cut-off frequency is set to 24 [Hz]. The experimental results suggest that the proposed cough detection method can be a useful tool as a primary care for aged people with a bronchitis like an asthmatic bronchitis and a bronchopneumonia.

  • Design and Implementation of a Fourth-Order Quadrature Band-Pass Delta-Sigma Modulator for Low-IF Receivers

    Sung-Wook JUNG  Chang-Gene WOO  Sang-Won OH  Hae-Moon SEO  Pyung CHOI  

     
    PAPER-Analog Signal Processing

      Page(s):
    2649-2656

    The delta-sigma modulator (DSM) is an excellent choice for high-resolution analog-to-digital converters. Recently, a band-pass DSM has been a desirable choice for direct conversion of an IF signal into a digital bit stream. This paper proposes a quadrature band-pass DSM for digitizing a narrow-band IF signal. This modulator can achieve a lower total order, higher signal-to-noise ratio (SNR), and higher bandwidth when compared with conventional band-pass modulators. An experimental prototype employing the quadrature topology has been integrated in 0.6 µm, double-poly, double-metal CMOS technology with capacitors synthesized from a stacked poly structure. This system clocked at 13 MHz and digitized a 200 kHz bandwidth signal centered at 4.875 MHz with 100 dB of dynamic range. Power consumption is 190 mW at 5 V.

  • Computation of AB2 Multiplier in GF(2m) Using an Efficient Low-Complexity Cellular Architecture

    Chung-Hsin LIU  Nen-Fu HUANG  Chiou-Yng LEE  

     
    PAPER-VLSI Design Technology and CAD

      Page(s):
    2657-2663

    This study presents two new bit-parallel cellular multipliers based on an irreducible all one polynomial (AOP) over the finite field GF(2m). Using the property of the AOP, this work also presents an efficient algorithm of inner-product multiplication for computing AB2 multiplications is proposed, with a structure that can simplify the time and space complexity for hardware implementations. The first structure employs the new inner-product multiplication algorithm to construct the bit-parallel cellular architecture. The designed multiplier only requires the computational delays of (m+1)(TAND+TXOR). The second proposed structure is a modification of the first structure, and it requires (m+2) TXOR delays. Moreover, the proposed multipliers can perform A2iB2j computations by shuffling the coefficients to make i and j integers. For the computing multiplication in GF(2m), the novel multipliers turn out to be efficient as they simplify architecture and accelerate computation. The two novel architectures are highly regular, simpler, and have shorter computation delays than the conventional cellular multipliers.

  • Numerical Calculation of Cylindrical Functions of Complex Order Using Debye's Asymptotic Series

    Mohd Abdur RASHID  Masao KODAMA  

     
    PAPER-Numerical Analysis and Optimization

      Page(s):
    2664-2671

    Debye's asymptotic series is frequently used for calculation of cylindrical functions. However, it seems that until now this series has not been used in all-purpose programs for numerical calculation of the cylindrical functions. The authors attempt to develop these all-purpose programs. We present some improvements for the numerical calculation. As the results, Debye's series can be used for the all-purpose programs, and it is found out that the series gives sufficient accuracy if some conditions are satisfied.

  • An Algorithm for Finding Two Edge-Disjoint Paths in Tournaments

    Shin-ichi NAKAYAMA  Shigeru MASUYAMA  

     
    PAPER-Algorithms and Data Structures

      Page(s):
    2672-2678

    This paper presents an O(n2)-time algorithm for constructing two edge-disjoint paths connecting two given pairs of vertices in a given tournament graph. It improves the time complexity of a previously known O(n4)-time algorithm.

  • Simple and Secure Coin (SAS-Coin)--A Practical Micropayment System

    Manjula SANDIRIGAMA  Akihiro SHIMIZU  Matu-Tarow NODA  

     
    PAPER-Information Security

      Page(s):
    2679-2688

    In this paper we propose SAS-Coin, a very practical micro payment scheme based on a hash chain and a simple one time password authentication protocol called SAS. While it has many desirable features of a coin (anonymity etc.), it has no public key operations at any stage and has very little overheads. Moreover authentication is also available and a session key could be generated for encrypted information supply without any additional cost at all. Since there are no public key operations this is extremely useful for mobile telephone applications. This has sufficient security even for larger payments. Comparative analysis with some of the already proposed systems is also done.

  • Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting

    Kunihiko SADAKANE  Hiroshi IMAI  

     
    PAPER-Information Theory

      Page(s):
    2689-2698

    Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on the compression ratio and the speed of block sorting compression, which uses suffix sorting in its compression algorithm. The results show that the LZ77 using the two-level hash is suitable for small dictionaries, the LZ77 using suffix sorting is good for large dictionaries when fast decompression speed and efficient use of memory are necessary, and block sorting is good for large dictionaries.

  • Systematic Binary Deletion/Insertion Error Correcting Codes Capable of Correcting Random Bit Errors

    Kiattichai SAOWAPA  Haruhiko KANEKO  Eiji FUJIWARA  

     
    PAPER-Coding Theory

      Page(s):
    2699-2705

    This paper presents a class of binary block codes capable of correcting single synchronization errors and single reversal errors with fewer check bits than the existing codes by 3 bits. This also shows a decoding circuit and analyzes its complexity.

  • Trellis, Multilevel, and Turbo Codes with DC-Free Characteristic

    Chang Ki JEONG  Eon Kyeong JOO  

     
    PAPER-Coding Theory

      Page(s):
    2706-2714

    DC-free error-correcting codes based on partition chain are presented in this paper. The partition chain can be constructed from code partition chain of Reed-Muller codes. The line coding parameters for the partition chain such as maximum runlength and running digital sum are obtained. The trellis and multilevel code structure can be used to design the DC-free error-correcting codes. Especially, by adopting DC-free trellis codes as constituent codes, DC-free turbo codes can be designed. As results, the presented DC-free error-correcting codes have good coding characteristics.

  • Natural Gradient Learning for Spatio-Temporal Decorrelation: Recurrent Network

    Seungjin CHOI  Shunichi AMARI  Andrzej CICHOCKI  

     
    PAPER-Neural Networks and Bioengineering

      Page(s):
    2715-2722

    Spatio-temporal decorrelation is the task of eliminating correlations between associated signals in spatial domain as well as in time domain. In this paper, we present a simple but efficient adaptive algorithm for spatio-temporal decorrelation. For the task of spatio-temporal decorrelation, we consider a dynamic recurrent network and calculate the associated natural gradient for the minimization of an appropriate optimization function. The natural gradient based spatio-temporal decorrelation algorithm is applied to the task of blind deconvolution of linear single input multiple output (SIMO) system and its performance is compared to the spatio-temporal anti-Hebbian learning rule.

  • Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases

    Yeon-Dae KWON  Yasunori ISHIHARA  Shougo SHIMIZU  Minoru ITO  

     
    PAPER-General Fundamentals and Boundaries

      Page(s):
    2723-2735

    Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called k-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, k-sparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)-sparsity, which is strictly weaker than the k-sparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (k,c)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.

  • New Efficient Designs of Discrete and Differentiating FIR Hilbert Transformers

    Ishtiaq Rasool KHAN  Ryoji OHBA  

     
    LETTER-Digital Signal Processing

      Page(s):
    2736-2738

    New designs of MAXFLAT discrete and differentiating Hilbert transformers are presented using their interrelationships with digital differentiators. The new designs have the explicit formulas for their tap-coefficients, which are further modified to obtain a new class of narrow transition band filters, with a performance comparable to the Chebyshev filters.

  • Convergence Property of Tri-Quantized-x NLMS Algorithm

    Kensaku FUJII  Yoshinori TANAKA  

     
    LETTER-Digital Signal Processing

      Page(s):
    2739-2742

    The signed regressor algorithm, a variation of the least mean square (LMS) algorithm, is characterized by the estimation way of using the clipped reference signals, namely, its sign (). This clipping, equivalent to quantizing the reference signal to 1, only increases the estimation error by about 2 dB. This paper proposes to increase the number of the quantization steps to three, namely, 1 and 0, and shows that the 'tri-quantized-x' normalized least mean square (NLMS) algorithm with three quantization steps improves the convergence property.

  • Generalization of the Cyclic Convolution and Its Fast Computational Systems

    Hideo MURAKAMI  

     
    LETTER-Digital Signal Processing

      Page(s):
    2743-2746

    This paper introduces a generalized cyclic convolution which can be implemented via the conventional cyclic convolution system by the discrete Fourier transform (DFT) with pre-multiplication for the input and post-multiplication for the output. The generalized cyclic convolution is applied for computing a negacyclic convolution. Comparison shows that the proposed implementation is more efficient and simpler in structure than other methods. The modified Fermat number transform (MFNT) is known to be useful for computing a linear convolution of integer-valued sequences. The generalized cyclic convolution is also applied for generalizing the linear convolution system by MFNT, and easing the signal length restriction imposed by the system.

  • Improved Fundamental Frequency Estimation Using Parametric Cubic Convolution

    Hee-Suk PANG  SeongJoon BAEK  Koeng-Mo SUNG  

     
    LETTER-Digital Signal Processing

      Page(s):
    2747-2750

    A simple but effective fundamental frequency estimation method is proposed using parametric cubic convolution. The performance of the method is shown to be good not only for the stationary signals but also for the signal whose fundamental frequency is changing with time. In the simulation, comparisons with other high-accuracy methods are also shown. Due to its accuracy and simplicity, the proposed method is practically useful.

  • On Input-State Linearization of Nonlinear Systems with Uncertainty

    Ho-Lim CHOI  Jong-Tae LIM  

     
    LETTER-Systems and Control

      Page(s):
    2751-2755

    We present a result on the robust stabilization of uncertain nonlinear systems via applying feedback linearization. The allowable size of uncertainty is derived for stability. Based on that, we propose a technique that allows us to handle nonlinear systems which are not input-state linearizable. The usefulness of the technique is illustrated by numerical examples.

  • An Accurate Offset- and Gain-Compensated Sample/Hold Circuit

    Xiaojing SHI  Hiroki MATSUMOTO  Kenji MURAO  

     
    LETTER-Circuit Theory

      Page(s):
    2756-2757

    A novel SC (Switched-Capacitor) offset- and gain-compensated sample/hold circuit is presented. It is implemented by a new topology which reduces the effects due to the imperfections of op-amp. Simulation results indicate that the circuit achieves high accuracy without requiring high-quality components.

  • Finding All Solutions of Weakly Nonlinear Equations Using Linear Programming

    Kiyotaka YAMAMURA  Yoshii HATA  

     
    LETTER-Numerical Analysis and Optimization

      Page(s):
    2758-2761

    Recently, an efficient algorithm has been proposed for finding all solutions of systems of nonlinear equations using linear programming. In this algorithm, linear programming problems are formulated by surrounding component nonlinear functions by rectangles. In this letter, it is shown that weakly nonlinear functions can be surrounded by smaller rectangles, which makes the algorithm very efficient.

  • A Practical (t,n) Multi-Secret Sharing Scheme

    Hung-Yu CHIEN  Jinn-Ke JAN  Yuh-Min TSENG  

     
    LETTER-Information Security

      Page(s):
    2762-2765

    Based on the systematic block codes, we propose a (t,n) multi-secret sharing scheme. Compared with the previous works, our scheme has the advantages of smaller communication overhead, easy generator matrix construction and non-disclosure of users secret shares after multiple secret reconstruction operations. These advantages make the practical implementation of our scheme very attractive.

  • Remarks on the Unknown Key Share Attacks

    Joonsang BAEK  Kwangjo KIM  

     
    LETTER-Information Security

      Page(s):
    2766-2769

    This letter points out some flaws in the previous works on UKS (unknown key-share) attacks. We show that Blake-Wilson and Menezes' revised STS-MAC (Station-to-Station Message Authentication Code) protocol, which was proposed to prevent UKS attack, is still vulnerable to a new UKS attack. Also, Hirose and Yoshida's key agreement protocol presented at PKC'98 is shown to be insecure against public key substitution UKS attacks. Finally, we discuss countermeasures for such UKS attacks.

  • Competitive Learning Algorithms Founded on Adaptivity and Sensitivity Deletion Methods

    Michiharu MAEDA  Hiromi MIYAJIMA  

     
    LETTER-Neural Networks and Bioengineering

      Page(s):
    2770-2774

    This paper describes two competitive learning algorithms from the viewpoint of deleting mechanisms of weight (reference) vectors. The techniques are termed the adaptivity and sensitivity deletions participated in the criteria of partition error and distortion error, respectively. Experimental results show the effectiveness of the proposed approaches in the average distortion.