The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] LSI design(30hit)

1-20hit(30hit)

  • Libretto: An Open Cell Timing Characterizer for Open Source VLSI Design

    Shinichi NISHIZAWA  Toru NAKURA  

     
    PAPER

      Pubricized:
    2022/09/13
      Vol:
    E106-A No:3
      Page(s):
    551-559

    We propose an open source cell library characterizer. Recently, free and open-sourced silicon design communities are attracted by hobby designers, academies and industries. These open-sourced silicon designs are supported by free and open sourced EDAs, however, in our knowledge, tool-chain lacks cell library characterizer to use original standard cells into digital circuit design. This paper proposes an open source cell library characterizer which can generate timing models and power models of standard cell library.

  • Register Minimization and its Application in Schedule Exploration for Area Minimization for Double Modular Redundancy LSI Design

    Yuya KITAZAWA  Kazuhito ITO  

     
    PAPER

      Pubricized:
    2021/09/01
      Vol:
    E105-A No:3
      Page(s):
    530-539

    Double modular redundancy (DMR) is to execute an operation twice and detect a soft error by comparing the duplicated operation results. The soft error is corrected by re-executing necessary operations. The re-execution requires error-free input data and registers are needed to store such necessary error-free data. In this paper, a method to minimize the required number of registers is proposed where an appropriate subgraph partitioning of operation nodes are searched. In addition, using the proposed register minimization method, a minimization of the area of functional units and registers required to implement the DMR design is proposed.

  • Exploiting Sparse Activation for Low-Power Design of Synchronous Neuromorphic Systems

    Jaeyong CHUNG  Woochul KANG  

     
    BRIEF PAPER-Integrated Electronics

      Vol:
    E100-C No:11
      Page(s):
    1073-1076

    Massive amounts of computation involved in real-time evaluation of deep neural networks pose a serious challenge in battery-powered systems, and neuromorphic systems specialized in neural networks have been developed. This paper first shows the portion of active neurons at a time dwindles as going toward the output layer in recent large-scale deep convolutional neural networks. Spike-based, asynchronous neuromorphic systems take advantage of the sparse activation and reduce dynamic power consumption, while synchronous systems may waste much dynamic power even for the sparse activation due to clocks. We thus propose a clock gating-based dynamic power reduction method that exploits the sparse activation for synchronous neuromorphic systems. We apply the proposed method to a building block of a recently proposed synchronous neuromorphic computing system and demonstrate up to 79% dynamic power saving at a negligible overhead.

  • Energy-Scalable 4KB LDPC Decoding Architecture for NAND-Flash-Based Storage Systems

    Youngjoo LEE  Jaehwan JUNG  In-Cheol PARK  

     
    PAPER-Electronic Circuits

      Vol:
    E99-C No:2
      Page(s):
    293-301

    This paper presents a novel low-power decoder architecture for the (36420, 32778) binary LDPC code targeting energy-efficient NAND-flash-based mobile devices. The proposed energy-scalable decoding algorithm reduces the operating bit-width of decoding function units at the early-use stage where the channel condition is good enough to lower the precision of computation. Based on a flexible adder structure, the decoding energy of the proposed LDPC decoder can be reduced by freezing the unnecessary parts of hardware resources. A prototype 4KB LDPC decoder is designed in a 65nm CMOS technology, which achieves an average decoding throughput of 8.13Gb/s with 1.2M equivalent gates. The power consumption of the decoder ranges from 397mW to 563mW depending on operating conditions.

  • Achieving Maximum Performance for Bus-Invert Coding with Time-Splitting Transmitter Circuit

    Myungchul YOON  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E95-A No:12
      Page(s):
    2357-2363

    An analytical performance evaluation model is presented in this paper. A time-splitting transmitter circuit employing a selectively activated flip-driver (SAFD) is presented and its performance is estimated by the new model. The optimal partitioning method which maximizes the performance of a given bus-invert (BI) coding circuit is also presented. When a bus is optimally partitioned, an ordinary BI circuit can reduce the number of bus transitions by about 25%, while an SAFD circuit can remove about 35% of them. The newly developed method is verified by simulations whose results correspond very well to the values predicted by the model.

  • Low-Complexity Memory Access Architectures for Quasi-Cyclic LDPC Decoders

    Ming-Der SHIEH  Shih-Hao FANG  Shing-Chung TANG  Der-Wei YANG  

     
    PAPER-Computer System

      Vol:
    E95-D No:2
      Page(s):
    549-557

    Partially parallel decoding architectures are widely used in the design of low-density parity-check (LDPC) decoders, especially for quasi-cyclic (QC) LDPC codes. To comply with the code structure of parity-check matrices of QC-LDPC codes, many small memory blocks are conventionally employed in this architecture. The total memory area usually dominates the area requirement of LDPC decoders. This paper proposes a low-complexity memory access architecture that merges small memory blocks into memory groups to relax the effect of peripherals in small memory blocks. A simple but efficient algorithm is also presented to handle the additional delay elements introduced in the memory merging method. Experiment results on a rate-1/2 parity-check matrix defined in the IEEE 802.16e standard show that the LDPC decoder designed using the proposed memory access architecture has the lowest area complexity among related studies. Compared to a design with the same specifications, the decoder implemented using the proposed architecture requires 33% fewer gates and is more power-efficient. The proposed new memory access architecture is thus suitable for the design of low-complexity LDPC decoders.

  • Efficient VLSI Design of Residue-to-Binary Converter for the Moduli Set (2n, 2n+1 - 1, 2n - 1)

    Su-Hon LIN  Ming-Hwa SHEU  Chao-Hsiang WANG  

     
    LETTER-Computer Systems

      Vol:
    E91-D No:7
      Page(s):
    2058-2060

    The moduli set (2n, 2n+1-1, 2n-1) which is free of (2n+1)-type modulus is profitable to construct a high-performance residue number system (RNS). In this paper, we derive a reduced-complexity residue-to-binary conversion algorithm for the moduli set (2n, 2n+1-1, 2n-1) by using New Chinese Remainder Theorem (CRT). The resulting converter architecture mainly consists of simple adder and multiplexer (MUX) which is suitable to realize an efficient VLSI implementation. For the various dynamic range (DR) requirements, the experimental results show that the proposed converter can significantly achieve at least 23.3% average Area-Time (AT) saving when comparing with the latest designs. Based on UMC 0.18 µm CMOS cell-based technology, the chip area for 16-bit residue-to-binary converter is 931931 µm2 and its working frequency is about 135 MHz including I/O pad.

  • Low Cost SoC Design of H.264/AVC Decoder for Handheld Video Player

    Sumek WISAYATAKSIN  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:4
      Page(s):
    1197-1205

    We propose a low cost and stand-alone platform-based SoC for H.264/AVC decoder, whose target is practical mobile applications such as a handheld video player. Both low cost and stand-alone solutions are particularly emphasized. The SoC, consisting of RISC core and decoder core, has advantages in terms of flexibility, testability and various I/O interfaces. For decoder core design, the proposed H.264/AVC coprocessor in the SoC employs a new block pipelining scheme instead of a conventional macroblock or a hybrid one, which greatly contribute to reducing drastically the size of the core and its pipelining buffer. In addition, the decoder schedule is optimized to block level which is easy to be programmed. Actually, the core size is reduced to 138 KGate with 3.5 kbyte memory. In our practical development, a single external SDRAM is sufficient for both reference frame buffer and display buffer. Various peripheral interfaces such as a compact flash, a digital broadcast receiver and a LCD driver are also provided on a chip.

  • A Novel Expression of Spatial Correlation by a Random Curved Surface Model and Its Application to LSI Design

    Shin-ichi OHKAWA  Hiroo MASUDA  Yasuaki INOUE  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    1062-1070

    We have proposed a random curved surface model as a new mathematical concept which enables the expression of spatial correlation. The model gives us an appropriate methodology to deal with the systematic components of device variation in an LSI chip. The key idea of the model is the fitting of a polynomial to an array of Gaussian random numbers. The curved surface is expressed by a new extension from the Legendre polynomials to form two-dimensional formulas. The formulas were proven to be suitable to express the spatial correlation with reasonable computational complexity. In this paper, we show that this approach is useful in analyzing characteristics of device variation of actual chips by using experimental data.

  • A Fast Elliptic Curve Cryptosystem LSI Embedding Word-Based Montgomery Multiplier

    Jumpei UCHIDA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-System LSIs and Microprocessors

      Vol:
    E89-C No:3
      Page(s):
    243-249

    Elliptic curve cryptosystems are expected to be a next standard of public-key cryptosystems. A security level of elliptic curve cryptosystems depends on a difficulty of a discrete logarithm problem on elliptic curves. The security level of a elliptic curve cryptosystem which has a public-key of 160-bit is equivalent to that of a RSA system which has a public-key of 1024-bit. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multipliers. A Montgomery multiplication is an efficient method for a finite field multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18 µm CMOS library, the high-speed design using 126 Kgates with 208-bit multipliers achieved operation times of 3.6 ms for a 160-bit point multiplication.

  • Boundary Scan Test Scheme for IP Core Identification via Watermarking

    Yu-Cheng FAN  Hen-Wai TSAO  

     
    LETTER-Programmable Logic, VLSI, CAD and Layout

      Vol:
    E88-D No:7
      Page(s):
    1397-1400

    This paper proposes a novel boundary scan test scheme for intellectual property (IP) core identification via watermarking. The core concept is embedding a watermark identification circuit (WIC) and a test circuit into the IP core at the behavior design level. The procedure depends on current IP-based design flow. This scheme can detect the identification of the IP provider without the need to examine the microphotograph after the chip has been manufactured and packaged. This scheme can successfully survive synthesis, placement, and routing and identify the IP core at various design levels. Experimental results have demonstrated that the proposed approach has the potential to solve the IP identification problem.

  • A Lower-Power Register File Based on Complementary Pass-Transistor Adiabatic Logic

    Jianping HU  Tiefeng XU  Hong LI  

     
    PAPER-Digital Circuits and Computer Arithmetic

      Vol:
    E88-D No:7
      Page(s):
    1479-1485

    This paper presents a novel low-power register file based on adiabatic logic. The register file consists of a storage-cell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. The storage-cell array is based on the conventional memory cell. All the circuits except the storage-cell array employ CPAL (complementary pass-transistor adiabatic logic) to recover the charge of large node capacitance on address decoders, bit-lines and word-lines in fully adiabatic manner. The minimization of energy consumption was investigated by choosing the optimal size of CPAL circuits for large load capacitance. The power consumption of the proposed adiabatic register file is significantly reduced because the energy transferred to the large capacitance buses is mostly recovered. The energy and functional simulations are performed using the net-list extracted from the layout. HSPICE simulation results indicate that the proposed register file attains energy savings of 65% to 85% as compared to the conventional CMOS implementation for clock rates ranging from 25 to 200 MHz.

  • A Low-Power Architecture for Extended Finite State Machines Using Input Gating

    Shi-Yu HUANG  Chien-Jyh LIU  

     
    PAPER-Logic Synthesis

      Vol:
    E87-A No:12
      Page(s):
    3109-3115

    In this paper, we investigate a low-power architecture for designs modeled as an Extended Finite State Machine (EFSM). It is based on the general dynamic power management concept, in which the redundant computation can be dynamically disabled to reduce the overall power dissipation. The contribution of this paper is mainly a systematic procedure to identify almost maximal amount of redundant computation in a design given as an EFSM. There are two levels of redundant computation to be exploited--one is based on the machine state information, while the other is based on the transition information. After the extraction of the redundant computation, a low-power architecture using input gating is proposed to synthesize the final circuit. We tested the technique on a design computing a number's modulo inverse. Experimental results show that 31% power reduction can be achieved at the costs of 2% timing penalty and 16% area overhead.

  • Efficient Architectures for the Biorthogonal Wavelet Transform by Filter Bank and Lifting Scheme

    Yeu-Horng SHIAU  Jer Min JOU  Chin-Chi LIU  

     
    PAPER-VLSI Systems

      Vol:
    E87-D No:7
      Page(s):
    1867-1877

    In this paper, two efficient VLSI architectures for biorthogonal wavelet transform are proposed. One is constructed by the filter bank implementation and another is constructed by the lifting scheme. In the filter bank implementation, due to the symmetric property of biorthogonal wavelet transform, the proposed architecture uses fewer multipliers than the orthogonal wavelet transform. Besides, the polyphase decomposition is adopted to speed up the processing by a factor of 2. In the lifting scheme implementation, the pipeline-scheduling technique is employed to optimize the architecture. Both two architectures are with advantages of lower implementation complexity and higher throughput rate. Moreover, they can also be applied to realize the inverse DWT efficiently. Based on the above properties, the two architectures can be applied to time-critical image compressions, such as JPEG2000. Finally, the architecture constructed by the lifting scheme is implemented into a single chip on 0.35 µm 1P4M CMOS technology, and its area and working performance are 5.005 5.005 mm2 and 50 MHz, respectively.

  • Mixed Signal SoC Era

    Akira MATSUZAWA  

     
    INVITED PAPER

      Vol:
    E87-C No:6
      Page(s):
    867-877

    Application area of mixed signal technology is currently expanded to digital communication, networking, and digital storage systems from conventional digital audio and video systems. Digital consumer electronics are emerged and their markets are extremely increased. Rapid progress of integrated circuit technology has enabled a system level integration on a SoC. Thus mixed signal SoC becomes a majority in LSI industry. Almost all the analog functions should be realized by CMOS technology on SoC, yet some difficulties such as a low transconductance, a large mismatch voltage, and a large 1/f noise should be solved. CMOS device has been considered as a poor device for the analog use, however in reality, it has attained a remarkable progress for analog applications. CMOS device has a variety of circuit techniques to address its own issues and also has an analog performance that increases rapidly with technology scaling. The mixed signal SoC needs a new development strategy and design methodology that covers from system level to device level for addressing tough needs for a shorter development time, a lower cost, and a higher design quality. The optimizations over analog and digital and over system to device must be established for the development success. Difficulty of low voltage operation of further scaled CMOS in analog circuits will be the most serious issue. This results in the saturation of performance and increase of cost. The system level optimization over analog and digital, digital calibration and compensation, and the use of sigma-delta modulation method will give us the solution.

  • A High-Performance Tree-Block Pipelining Architecture for Separable 2-D Inverse Discrete Wavelet Transform

    Yeu-Horng SHIAU  Jer Min JOU  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    1966-1975

    In this paper, a high-performance pipelining architecture for 2-D inverse discrete wavelet transform (IDWT) is proposed. We use a tree-block pipeline-scheduling scheme to increase computation performance and reduce temporary buffers. The scheme divides the input subbands into several wavelet blocks and processes these blocks one by one, so the size of buffers for storing temporal subbands is greatly reduced. After scheduling the data flow, we fold the computations of all wavelet blocks into the same low-pass and high-pass filters to achieve higher hardware utilization and minimize hardware cost, and pipeline these two filters efficiently to reach higher throughput rate. For the computations of N N-sample 2-D IDWT with filter length of size K, our architecture takes at most (2/3)N2 cycles and requires 2N(K-2) registers. In addition, each filter is designed regularly and modularly, so it is easily scalable for different filter lengths and different levels. Because of its small storage, regularity, and high performance, the architecture can be applied to time-critical image decompression.

  • A Micro-Power Analog IC for Battery-Operated Systems

    Silvio BOLLIRI  Luigi RAFFO  

     
    PAPER-Integrated Electronics

      Vol:
    E86-C No:7
      Page(s):
    1385-1389

    The design of the analog part of a mixed analog-digital IC for a commercial wireless burglar alarm system is presented as an example of a very low-power VLSI design for battery-operated systems. The main constraint is battery life, which must be at least five years (with standard camera-battery). An operational amplifier, a power supply monitor and an oscillator are the core of the design. The operational amplifier absorbs 1.5 µA while the entire analog part absorbs 4 µA. Measures on each single part show compliance with specification. Test on working environment show its full functionality. Even though the example is application specific, the design solutions and each single element can also be utilized in many other battery-operated low-frequency devices (e.g. environmental parameter monitoring).

  • Quick Delay Calculation Model for Logic Circuit Optimization in Early Stages of LSI Design

    Norio OHKUBO  Takeo YAMASHITA  

     
    PAPER-Design Methods and Implementation

      Vol:
    E86-C No:4
      Page(s):
    618-623

    An accurate, fast delay calculation method suitable for high-performance, low-power LSI design is proposed. The delay calculation is composed of two steps: (1) the gate delay is calculated by using an effective capacitance obtained from a simple model we propose; and (2) the interconnect delay is also calculated from the effective capacitance and modified by using the gate-output transition time. The proposed delay calculation halves the error of a conventional rough calculation, achieving a computational error within 10% per gate stage. The mathematical models are simple enough that the method is suitable for quick delay calculation and logic circuit optimization in the early stages of LSI design. A delay optimization tool using this delay calculation method reduced the worst path delay of a multiply-add module by 11.2% and decreased the sizes of 58.1% of the gates.

  • VLSI Implementation of Lifting Discrete Wavelet Transform Using the 5/3 Filter

    Pei-Yin CHEN  

     
    PAPER-VLSI Systems

      Vol:
    E85-D No:12
      Page(s):
    1893-1897

    In this paper, a VLSI architecture for lifting-based discrete wavelet transform (LDWT) is presented. Our architecture folds the computations of all resolution levels into the same low-pass and high-pass units to achieve higher hardware utilization. Due to the regular and flexible structure of the design, its area is independent of the length of the 1-D input sequence, and its latency is independent of the number of resolution levels. For the computations of analysis process of N-sample 1-D 3-level LDWT, our design takes about N clock cycles and requires 2 multipliers, 4 adders, and 22 registers. It is fabricated with TSMC 0.35-µm cell library and has a die size of 1.21.2 mm2. The power dissipation of the chip is about 0.4 W at the clock rate of 80 MHz.

  • System-MSPA Design of H.263+ Video Encoder/Decoder LSI for Videotelephony Applications

    Chawalit HONSAWEK  Kazuhito ITO  Tomohiko OHTSUKA  Trio ADIONO  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design

      Vol:
    E84-A No:11
      Page(s):
    2614-2622

    In this paper, a LSI design for video encoder and decoder for H.263+ video compression is presented. LSI operates under clock frequency of 27 MHz to compress QCIF (176144 pixels) at the frame rate of 30 frame per second. The core size is 4.6 4.6 mm2 in a 0.35 µm process. The architecture is based on bus connected heterogeneous dedicated modules, named as System-MSPA architecture. It employs the fast and small-chip-area dedicated modules in lower level and controls them by employing the slow and flexible programmable device and an external DRAM. Design results in success to achieve real time encoder in quite compact size without losing flexibility and expand ability. Real time emulation and easy test capability with external PC is also implemented.

1-20hit(30hit)