The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] pipelined(42hit)

21-40hit(42hit)

  • A Design of AES Encryption Circuit with 128-bit Keys Using Look-Up Table Ring on FPGA

    Hui QIN  Tsutomu SASAO  Yukihiro IGUCHI  

     
    PAPER-Computer Components

      Vol:
    E89-D No:3
      Page(s):
    1139-1147

    This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.

  • Power-Aware Scalable Pipelined Booth Multiplier

    Hanho LEE  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E88-A No:11
      Page(s):
    3230-3234

    An energy-efficient power-aware design is highly desirable for DSP functions that encounter a wide diversity of operating scenarios in battery-powered wireless sensor network systems. Addressing this issue, this letter presents a low-power power-aware scalable pipelined Booth multiplier that makes use of dynamic-range detection unit, sharing common functional units, ensemble of optimized Wallace-trees and a 4-bit array-based adder-tree for DSP applications.

  • Optimum Solution of On-Chip A/D Converter for Cooled Type Infrared Focal Plane Array

    Sang Gu KANG  Doo Hyung WOO  Hee Chul LEE  

     
    PAPER-Electronic Circuits

      Vol:
    E88-C No:3
      Page(s):
    413-419

    Transferring the image information in analog form between the focal plane array (FPA) and the external electronics causes the disturbance of the outside noise. On-chip analog-to-digital (A/D) converter into the readout integrated circuit (ROIC) can eliminate the possibilities of the cross-talk of noise. Also, the information can be transported more efficiently in power in the digital domain compared to the analog domain. In designing on-chip A/D converter for cooled type high density infrared detector array, the most stringent requirements are power dissipation, number of bits, die area and throughput. In this study, pipelined type A/D converter was adopted because it has high operation speed characteristics with medium power consumption. Capacitor averaging technique and digital error correction for high resolution was used to eliminate the error which is brought out from the device mismatch. The readout circuit was fabricated using 0.6 µm CMOS process for 128 128 mid-wavelength infrared (MWIR) HgCdTe detector array. Fabricated circuit used direct injection type for input stage, and then S/N ratio could be maximized with increasing the integration capacitor. The measured performance of the 14 b A/D converter exhibited 0.2 LSB differential non-linearity (DNL) and 4 LSB integral non-linearity (INL). A/D converter had a 1 MHz operation speed with 75 mW power dissipation at 5 V. It took the die area of 5.6 mm2. It showed the good performance that can apply for cooled type high density infrared detector array.

  • Pipelined Wake-Up Scheme to Reduce Power Line Noise for Block-Wise Shutdown of Low-Power VLSI Systems

    Jin-Hyeok CHOI  Yong-Ju KIM  Jae-Kyung WEE  Seongsoo LEE  

     
    LETTER

      Vol:
    E87-C No:4
      Page(s):
    629-633

    Block-wise shutdown of idle functional blocks in VLSI systems is a promising approach to reduce power consumption. Especially, multi-threshold voltage CMOS (MTCMOS) is widely accepted to save leakage power during idle time. As operating frequency increases, it requires short wake-up time to use the shutdown block in time. However, short wake-up time of a large block causes large current surge during wake-up process. This often leads to system malfunction due to severe power line noise. This is one of the serious problems for practical implementation of MTCMOS block-wise shutdown. This letter proposes an effective wake-up scheme for block-wise shutdown of low-power VLSI systems. It exploits pipelined wake-up strategy that reduces current surge during wake-up process. In this letter, the proposed scheme was analyzed and simulated from the viewpoint of power distribution network. To verify its validity, it was applied to a multiplier block in Compact Flash controller chip on a test board. According to the simulation results of equivalent R, L, and C modeling, the proposed scheme achieved significant improvement over conventional concurrent shutdown schemes.

  • Design and Evaluation of a High Speed Routing Lookup Architecture

    Jun ZHANG  JeoungChill SHIM  Hiroyuki KURINO  Mitsumasa KOYANAGI  

     
    PAPER-Implementation and Operation

      Vol:
    E87-B No:3
      Page(s):
    406-412

    The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.

  • A High Throughput Pipelined Architecture for Blind Adaptive Equalizer with Minimum Latency

    Masashi MIZUNO  James OKELLO  Hiroshi OCHI  

     
    PAPER

      Vol:
    E86-A No:8
      Page(s):
    2011-2019

    In this paper, we propose a pipelined architecture for an equalizer based on the Multilevel Modified Constant Modulus Algorithm (MMCMA). We also provide the correction factor that mathematically converts the proposed pipelined adaptive equalizer into an equivalent non-pipelined conventional MMCMA based equalizer. The proposed method of pipelining uses modules with 6 filter coefficients, resulting in an overall latency of a single sampling period, along the main transmission line. The basic concept of the proposed architecture is to implement the Finite Impulse Response (FIR) filter and the algorithm portion of the adaptive equalizer, such that the critical path of the whole circuit has a maximum of three complex multipliers and three adders.

  • A Digital Calibration Technique of Capacitor Mismatch for Pipelined Analog-to-Digital Converters

    Masanori FURUTA  Shoji KAWAHITO  Daisuke MIYAZAKI  

     
    PAPER

      Vol:
    E85-C No:8
      Page(s):
    1562-1568

    A digital calibration technique, which corrects errors due to capacitor mismatch in pipelined ADC and directly measures the error coefficients using the ADC INL plot, is described. The proposed technique can be applied for various types of pipelined ADC architectures. Test results using an implemented 10-bit pipelined ADC show that the ADC achieves a peak signal-to-noise-and-distortion ratio of 56.5 dB, a peak integral non-linearity of 0.3 LSB, and a peak differential non-linearity of 0.3 LSB using the digital calibration.

  • A Systolic Array RLS Processor

    Takahiro ASAI  Tadashi MATSUMOTO  

     
    PAPER-Terrestrial Radio Communications

      Vol:
    E84-B No:5
      Page(s):
    1356-1361

    This paper presents the outline of the systolic array recursive least-squares (RLS) processor prototyped primarily with the aim of broadband mobile communication applications. To execute the RLS algorithm effectively, this processor uses an orthogonal triangularization technique known in matrix algebra as QR decomposition for parallel pipelined processing. The processor board comprises 19 application-specific integrated circuit chips, each with approximately one million gates. Thirty-two bit fixed-point signal processing takes place in the processor, with which one cycle of internal cell signal processing requires approximately 500 nsec, and boundary cell signal processing requires approximately 80 nsec. The processor board can estimate up to 10 parameters. It takes approximately 35 µs to estimate 10 parameters using 41 known symbols. To evaluate signal processing performance of the prototyped systolic array processor board, processing time required to estimate a certain number of parameters using the prototyped board was comapred with using a digital signal processing (DSP) board. The DSP board performed a standard form of the RLS algorithm. Additionally, we conducted minimum mean-squared error adaptive array in-lab experiments using a complex baseband fading/array response simulator. In terms of parameter estimation accuracy, the processor is found to produce virtually the same results as a conventional software engine using floating-point operations.

  • Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description

    Makiko ITOH  Yoshinori TAKEUCHI  Masaharu IMAI  Akichika SHIOMI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    394-400

    A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.

  • A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

    Katsuya SHINOHARA  Norimasa OHTSUKI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2356-2365

    This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.

  • A Scalable Pipelined Memory Architecture for Fast ATM Packet Switching

    Gab Joong JEONG  MoonKey LEE  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E82-A No:9
      Page(s):
    1937-1944

    This paper describes the design of a scalable pipelined memory buffer for a shared scalable buffer ATM switch. The memory architecture provides high speed and scalability, and eliminates the restriction of memory cycle time in a shared buffer ATM switch. It provides versatile performance in a shared buffer ATM switch using its scalability. The architecture consists of a 2-D array configuration of small memory banks. Increasing the array configuration enlarges the entire memory capacity. Maximum cycle time of a designed scalable memory is 4 ns. The designed memory is embedded in the prototype chip of a shared scalable buffer ATM switch with 4 4 configuration of 4160-bit SRAM memory banks. It is integrated in 0.6 µm double-metal single-poly CMOS technology.

  • High-Speed Low-Power CMOS Pipelined Analog-to-Digital Converter

    Ri-A JU  Dong-Ho LEE  Sang-Dae YU  

     
    PAPER

      Vol:
    E82-A No:6
      Page(s):
    981-986

    This paper describes a 10-bit 40-MS/s pipelined A/D converter implemented in a 0.8-µm double-poly, double-metal CMOS process. This A/D converter achieves low power dissipation of 36-mW at 5-V power supply. A 1.5-bit/stage pipelined architecture allows large correction range for comparator offset, and performs fast interstage signal processing. For high speed and low power operation, the sample-and-hold amplifier is designed using op-amp sharing technique and dynamic comparator. In addition, fully-differential folded-cascode op amp with gain-boosting stage is designed by an automatic design tool. When 10-MHz input signal is applied, SNDR is 55.0 dB, and SNR is 56.7 dB. The DNL and INL exhibit 0.6 LSB, +1/-0.75 LSB respectively.

  • Low-Power Area-Efficient Pipelined A/D Converter Design Using a Single-Ended Amplifier

    Daisuke MIYAZAKI  Shoji KAWAHITO  Yoshiaki TADOKORO  

     
    PAPER

      Vol:
    E82-A No:2
      Page(s):
    293-300

    This paper presents a new scheme of a low-power area-efficient pipelined A/D converter using a single-ended amplifier. The proposed multiply-by-two single-ended amplifier using switched capacitor circuits has smaller DC bias current compared to the conventional fully-differential scheme, and has a small capacitor mismatch sensitivity, allowing us to use a smaller capacitance. The simple high-gain dynamic-biased regulated cascode amplifier also has an excellent switching response. These properties lead to the low-power area-efficient design of high-speed A/D converters. The estimated power dissipation of the 10-b pipelined A/D converter is less than 12 mW at 20 MSample/s.

  • A Systolic Pipelined NTSC/PAL Digital Video Encoder

    Seung Ho OH  Han Jun CHOI  Moon Key LEE  

     
    PAPER-Digital Signal Processing

      Vol:
    E81-A No:6
      Page(s):
    1021-1028

    This paper describes the design of a multistandard video encoder. The proposed encoder accepts conventional NTSC/PAL video signals. The encoder consists of four major building functions which are color space converter, digital filters, color modulator, and timing generator. In order to support multistandard video signals, a programmable systolic architecture is adopted in designing various digital filters. Interpolation digital filters are also used to enhance SNR of encoded video signals. The input to the encoder can be either YCbCr signal or RGB signal. The outputs are luminance (Y), chrominance (C), and composite video baseband (Y+C) signals. The architecture of the encoder is defined by using Matlab program and is modelled by using Verilog-HDL language. The overall operation is verified by using various video signals, such as color bar patterns, ramp signals, and so on. The encoder contains 36 k gates and is implemented by using 0. 65 µm CMOS process.

  • Bipartition and Synthesis in Low Power Pipelined Circuits

    Shyh-Jong CHEN  Rung-Ji SHANG  Xian-June HUANG  Shang-Jang RUAN  Feipei LAI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E81-A No:4
      Page(s):
    664-671

    By treating each different output pattern as a state, we propose a low power architecture for pipelined circuits using bipartition. It is possible that the output of a pipelined circuit transit mainly among some of different states. If some few states dominate most of the time, we could partition the combinational portion of a pipelined circuit into two blocks: one that contains the few states with high activity is small and the other that contains the remainder with low activity is big. The original pipelined circuit is bipartitioned into two individual pipelined circuits. An additional combination logic block is introduced to control which of the two partitioned blocks to work. Power reduction is based on the observation that most time the small block is at work and the big one is at idle. In order to minimize the power consumption of this architecture, we present an algorithm that can improve the efficiency of this additional control block. Experiments with MCNC benchmarks show high percentage of power saving by using our new architecture for low power pipelined circuit design.

  • A 10-bit 50 MS/s 300 mW A/D Converter Using Reference Feed-Forward Architecture

    Takashi OKUDA  Osamu MATSUMOTO  Toshio KUMAMOTO  Masao ITO  Hiroyuki MOMONO  Takahiro MIKI  Takeshi TOKUDA  

     
    PAPER

      Vol:
    E80-C No:12
      Page(s):
    1553-1559

    This paper describes the 10-bit 50 MS/s pipelined CMOS A/D Converter using a "reference feed-forward architecture." In this architecture, reference voltage generated in a reference generator block and residual voltage from a DA/subtractor block are fed to the next stage. The reference generator block and DA/subtractor block are constructed using resistive-load, low-gain differential amplifiers. The high-gain, high-speed amplifiers consuming much power are not used. Therefore, the power consumption of this ADC is reduced. The gain matching of the reference voltage with the internal signal range is achieved through the introduction of the reference generator block having the same characteristics as a DA/subtractor block. Each offset voltage of the differential amplifier in the reference generator block and the DA/subtractor block is canceled by the offset cancellation technique, individually. In addition, the front-end sample/hold circuit is eliminated to reduce power consumption. Because of the introduction of high-speed comparators based on the source follower and latch circuit into the first stage A/D subconverter, analog bandwidth is not degraded. This ADC has been fabricated in double-polysilicon, double-metal, 0.5µm CMOS technology, and it operates at 50 MS/s with a 300-mW (Vdd=3.0 V) power consumption. The differential linearity error of less than +/-1 LSB is obtained.

  • ASAver.1: An FPGA-Based Education Board for Computer Architecture/System Design

    Hiroyuki OCHI  Yoko KAMIDOI  Hideyuki KAWABATA  

     
    PAPER

      Vol:
    E80-A No:10
      Page(s):
    1826-1833

    This paper proposes a new approach that makes it possible for every undergraduate student to perform experiments of developing a Ipipelined RISC processor within limited time available for the course. The approach consists of 4 steps. At the first step, every student implements by himself/herself a pipelined RISC processor which is based on a given, very simple model; it has separate buses for instruction and data memory ("Harvard architecture") to avoid structural hazard, while it completely ignores data control hazards to make implementation easy. Although it is such a "defective" processor, we can test its functionality by giving object code containing sufficient amount of NOP instructions to avoid hazards. At the second step, NOP instructions are deleted and behavior of the developed processor is observed carefully to understand data and control hazards. At the third step, benchmark problems are provided, and every student challenges to improve its performance. Finally every student is requested to present how he/she improved the processor. This paper also describes a new educational FPGA board ASAver.1 which is useful for experiments from introductory class to computer architecture/system class. As a feasibility study, a 16-bit pipelined RISC processor "ASAP-O" has been developed which has eight 16-bit general purpose registers, a 16-bit program counter, and a zero flag, with 10 essential instructions.

  • Achieving Fault Tolerance in Pipelined Multiprocessor Systems

    Jeng-Ping LIN  Sy-Yen KUO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E80-D No:6
      Page(s):
    665-671

    This paper focuses on recovering from processor transient faults in pipelined multiprocessor systems. A pipelined machine may employ out of order execution and branch prediction techniques to increase performance, thus a precise computation state would not be available. We propose an efficient scheme to maintain the precise computation state in a pipelined machine. The goal of this paper is to implement checkpointing and rollback recovery utilizing the technique of precise interrupt in a pipelined system. Detailed analysis is included to demonstrate the effectiveness of this method.

  • Address Addition and Decoding without Carry Propagation

    Yung-Hei LEE  Seung Ho HWANG  

     
    LETTER-Algorithm and Computational Complexity

      Vol:
    E80-D No:1
      Page(s):
    98-100

    The response time of adders is mainly determined by the carry propagation delay. This letter deals with a scheme which combines the address addition and decoding together. Although addition is involved in the process, we show that it can be computed without carry propagation. Memory latency is one of the most performance limiting factors. The authors present a new decoder logic named fused add-decoder (FADEC), which performs address addition and decoding in a single process. FADEC can reduce memory latency by eliminating separate address addition cycle.

  • High-Speed CMOS SRAM Technologies for Cache Applications

    Koichiro ISHIBASHI  

     
    INVITED PAPER-Static RAMs

      Vol:
    E79-C No:6
      Page(s):
    724-734

    This parer describes high-speed CMOS SRAM circuit technologies used in cache memories. In recent years, high-speed SRAM technology has led to higher cycle frequencies, but the rate of increase in the SRAM density has slowed. Operating modes of high-speed SRAMs are compared and the advantage of wave-pipelined SRAMs in terms of cycle frequency is shown. Three types of sense amplifiers used in SRAMs are also compared from the viewpoint of speed and power dissipation. Current sense amplifiers provide high-speed operation with low power dissipation, while latch-type sense amplifiers appear most suitable for ultra-low-power SRAMs. Low voltage operation and size reduction of full CMOS cells are now the most pressing issues in the development of SRAMs for cache memories.

21-40hit(42hit)