The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] fpga(330hit)

221-240hit(330hit)

  • A 90 nm LUT Array for Speed and Yield Enhancement by Utilizing Within-Die Delay Variations

    Kazuya KATSUKI  Manabu KOTANI  Kazutoshi KOBAYASHI  Hidetoshi ONODERA  

     
    PAPER-Digital

      Vol:
    E90-C No:4
      Page(s):
    699-707

    In this paper, we show that speed and yield of reconfigurable devices can be enhanced by utilizing within-die (WID) delay variations. An LUT Array LSI is fabricated to confirm whether FPGAs have clear WID variations to be utilized. We can measure delay variations by counting the number of LUTs a signal propagates within a certain time. Clear die-to-die (D2D) and WID variations are observed. We propose a variation model from the measurement results. Adequacy of the model is discussed from randomness of the random component. Effect of the speed and yield enhancement is confirmed using the proposed model. Yield increases from 80.0% to 100.0% by optimizing configurations.

  • A Systolic FPGA Architecture of Two-Level Dynamic Programming for Connected Speech Recognition

    Yong KIM  Hong JEONG  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    562-568

    In this paper, we present an efficient architecture for connected word recognition that can be implemented with field programmable gate array (FPGA). The architecture consists of newly derived two-level dynamic programming (TLDP) that use only bit addition and shift operations. The advantages of this architecture are the spatial efficiency to accommodate more words with limited space and the absence of multiplications to increase computational speed by reducing propagation delays. The architecture is highly regular, consisting of identical and simple processing elements with only nearest-neighbor communication, and external communication occurs with the end processing elements. In order to verify the proposed architecture, we have also designed and implemented it, prototyping with Xilinx FPGAs running at 33 MHz.

  • Joint Hardware-Software Implementation of Adaptive Array Antenna for ISDB-T Reception

    Dang Hai PHAM  Takanobu TABATA  Hirokazu ASATO  Satoshi HORI  Tomohisa WADA  

     
    PAPER

      Vol:
    E89-B No:12
      Page(s):
    3215-3224

    In this paper, an adaptive array antenna is implemented to enhance the performance of digital TV ISDB-T reception. Issues of realizing the proposed array antenna and its implementation by a joint hardware-software solution are also presented in this paper. Instead of using known reference signals, the proposed method utilizes the GI (Guard Interval) and a periodic property of OFDM signal as a constraint to realize MRC (Maximum Ratio Combining) and SMI (Sample Matrix Inversion) adaptive beam-forming algorithms. Experimental results show that the proposed system drastically improves the quality of reception. Moreover, the proposed system can achieve excellent performance under the conditions of strong interferences.

  • Fast FPGA-Emulation-Based Simulation Environment for Custom Processors

    Yuichi NAKAMURA  Kouhei HOSOKAWA  

     
    PAPER-Simulation and Verification

      Vol:
    E89-A No:12
      Page(s):
    3464-3470

    This paper describes a new method for the simulation environment for a custom processor. It is generally very hard to develop an accurate simulator for a custom processor rapidly, even if simple instruction-set-level simulator (ISS). The proposed method uses a field-programmable-gate-array emulator with a PCI interface and debugging GUI software on a PC. Since the emulator implements the processor design at the register-transfer or net-list level, the emulation results are almost the same as the results obtained with the actual processor. To support rich debugging functions like those provided by the conventional software simulator, we use a debugging buffer and break-control circuits. Experimental results show that a simulator constructed by the proposed method can be constructed within several hours and that it can break the processor operation at any specified point and observe the internal signals when the emulated system is running at 1-30 MHz. The accuracy of the constructed simulator is the same as that of RTL simulation and much higher than that of software ISS simulation. We show that we can provide a fast, accurate, and useful simulator for any processor design specified at the register-transfer level.

  • Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method

    Shinobu NAGAYAMA  Tsutomu SASAO  Jon T. BUTLER  

     
    PAPER-Circuit Synthesis

      Vol:
    E89-A No:12
      Page(s):
    3510-3518

    This paper presents an architecture and a synthesis method for compact numerical function generators (NFGs) for trigonometric, logarithmic, square root, reciprocal, and combinations of these functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and approximates the given function by a quadratic polynomial for each segment. Thus, we can implement fast and compact NFGs for a wide range of functions. Experimental results show that: 1) our NFGs require, on average, only 4% of the memory needed by NFGs based on the linear approximation with non-uniform segmentation; 2) our NFG for 2x-1 requires only 22% of the memory needed by the NFG based on a 5th-order approximation with uniform segmentation; and 3) our NFGs achieve about 70% of the throughput of the existing table-based NFGs using only a few percent of the memory. Thus, our NFGs can be implemented with more compact FPGAs than needed for the existing NFGs. Our automatic synthesis system generates such compact NFGs quickly.

  • Implementation of Multi-Channel Modem for DSRC System on Signal Processing Platform for Software Defined Radio

    Akihisa YOKOYAMA  Hiroshi HARADA  

     
    PAPER

      Vol:
    E89-B No:12
      Page(s):
    3225-3232

    We previously proposed an architecture for software defined radio called the reconfigurable packet routing-oriented signal processing platform (RPPP). This architecture was suited to wireless signal processing applications, which require radio functions to be selected in real time depending on the transmitted signal. A number of radio standards are used in DSRC systems for vehicle communication and vehicle equipment is required to transmit and receive the radio signals used on each particular occasion. An implementation of RPPP is described in this paper that enables the dynamic handling of two ARIB standards for DSRC. After an explanation of the basic architecture and an analysis of RPPP, the implementation of a reconfigurable DSRC transceiver for ASK and π/4 shift-QPSK is described. The implementation is then discussed, evaluated in terms of the number of logic units needed. We concluded that our platform is 27.6% more efficient in utilizing logic than that achieved with fixed design.

  • An Efficient and Effective Algorithm for Online Task Placement with I/O Communications in Partially Reconfigurable FPGAs

    Mitsuru TOMONO  Masaki NAKANISHI  Shigeru YAMASHITA  Kazuo NAKAJIMA  Katsumasa WATANABE  

     
    PAPER-System Level Design

      Vol:
    E89-A No:12
      Page(s):
    3416-3426

    In a partially reconfigurable FPGA of the future, arbitrary portions of its logic resources and interconnection networks will be reconfigured without affecting the other parts. Multiple tasks will be mapped and executed concurrently in such an FPGA. Efficient execution of the tasks using the limited resources of the FPGA will necessitate effective resource management. A number of online FPGA placement methods have recently been proposed for such an FPGA. However, they cannot handle I/O communications of the tasks. Taking such I/O communications into consideration, we introduce a new approach to online FPGA placement. We present an algorithm for placing each arriving task in an empty area so as to complete all the tasks efficiently. We develop two fitting strategies to effectively handle I/O communications of the tasks. Our experimental results show that properly weighted combinations of these and two other previously proposed strategies enable this algorithm to run very fast and make an effective placement of the tasks. In fact, we show that the overhead associated with the use of this algorithm is negligible as compared to the total execution time of the tasks.

  • A Multi-Context FPGA Using Floating-Gate-MOS Functional Pass-Gates

    Masanori HARIYAMA  Sho OGATA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E89-C No:11
      Page(s):
    1655-1661

    Multi-context FPGAs (MC-FPGAs) have multiple memory bits per configuration bit forming configuration planes for fast switching between contexts. The additional memory planes cause a large overhead in area when a number of contexts are used. To overcome the overhead, a fine-grained MC-FPGA architecture using a floating-gate-MOS functional pass gate (FGFP) is presented which merges threshold operation and storage function on a single floating-gate MOS transistor. The test chip is designed using a 0.35 µm CMOS-EPROM technology. The transistor count of the proposed multi-context switch (MC-switch) is reduced to 13% in comparison with SRAM-based one. The total area of the proposed MC-FPGA is reduced to about 56% of that of a conventional SRAM-based MC-FPGA.

  • Hardware Implementation of an Inverse Function Delayed Neural Network Using Stochastic Logic

    Hongge LI  Yoshihiro HAYAKAWA  Shigeo SATO  Koji NAKAJIMA  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E89-D No:9
      Page(s):
    2572-2578

    In this paper, the authors present a new digital circuit of neuron hardware using a field programmable gate array (FPGA). A new Inverse function Delayed (ID) neuron model is implemented. The Inverse function Delayed model, which includes the BVP model, has superior associative properties thanks to negative resistance. An associative memory based on the ID model with self-connections has possibilities of improving its basin sizes and memory capacity. In order to decrease circuit area, we employ stochastic logic. The proposed neuron circuit completes the stimulus response output, and its retrieval property with negative resistance is superior to a conventional nonlinear model in basin size of an associative memory.

  • Design and Evaluation of Data-Dependent Hardware for AES Encryption Algorithm

    Ryoichiro ATONO  Shuichi ICHIKAWA  

     
    LETTER-VLSI Systems

      Vol:
    E89-D No:7
      Page(s):
    2301-2305

    If a logic circuit was specialized to a specific input, the derived circuit would be faster and smaller than the original. This study presents various designs of a key-specific AES encryption circuit. In our iterative design, 41% of the logic gates and 20% of RAM were reduced, while 24% more performance was derived. In our pipelined design, 54% of the logic gates and 20% of RAM were reduced, while 74% higher performance was achieved. The results on DES encryption circuits are also presented for comparison.

  • Proposal of Testable Multi-Context FPGA Architecture

    Kazuteru NAMBA  Hideo ITO  

     
    PAPER-Dependable Computing

      Vol:
    E89-D No:5
      Page(s):
    1687-1693

    Multi-context FPGAs allow very quick reconfiguration by storing multiple configuration data at the same time. While testing for FPGAs with single-context memories has already been studied by many researchers, testing for multi-context FPGAs has not been proposed yet. This paper presents an architecture of testable multi-context FPGAs. In the proposed multi-context FPGA, configuration data stored in a context can be copied into another context. This paper also shows testing of the proposed multi-context FPGA. The proposed testing uses the testing for the traditional FPGAs with single-context. The testing is capable of detecting single stuck-at faults and single open faults which affect normal operations. The number of test configurations for the proposed testing is at most two more than that for the testing of FPGAs with single-context memories. The area overhead of the proposed architecture is 7% and 4% of the area of a multi-context FPGA without the proposed architecture when the number of contexts in a configuration memory is 8 and 16, respectively.

  • Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule

    Kazunori SHIMIZU  Tatsuyuki ISHIKAWA  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E89-A No:4
      Page(s):
    969-978

    In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.

  • Low-Power Low-Leakage FPGA Design Using Zigzag Power Gating, Dual-VTH/VDD and Micro-VDD-Hopping

    Canh Quang TRAN  Hiroshi KAWAGUCHI  Takayasu SAKURAI  

     
    PAPER-Low Power Techniques

      Vol:
    E89-C No:3
      Page(s):
    280-286

    A low-power FPGA design approach is proposed based on a fine-grain VDD control scheme called micro-VDD-hopping. Four configurable logic blocks (CLBs) are grouped into one block where VDD is shared. In the micro-VDD-hopping scheme, VDD in each block is changed between VDDH (high VDD) and VDDL (low VDD) spatially and temporally in order to achieve lower power without performance degraded. A low-power level shifter that has less contention is also proposed for low-swing inter-block signals. The FPGA incorporates the Zigzag power-gating scheme, in which special care has been taken to cope with a sneak leakage-path problem. A test chip was fabricated using a 0.35-µm CMOS technology, together with the conventional fixed-VDD FPGA for comparison. Measurement results show that dynamic power in the proposed scheme can be reduced by 86% when a frequency is half of the maximum one. Simulation using a 90-nm CMOS technology shows that leakage power can be reduced by 97%, when the proposed method is used. The area overhead of the proposed FPGA is 2%.

  • A Design of AES Encryption Circuit with 128-bit Keys Using Look-Up Table Ring on FPGA

    Hui QIN  Tsutomu SASAO  Yukihiro IGUCHI  

     
    PAPER-Computer Components

      Vol:
    E89-D No:3
      Page(s):
    1139-1147

    This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.

  • Prototype Implementation of Real-Time ML Detectors for Spatial Multiplexing Transmission

    Toshiaki KOIKE  Yukinaga SEKI  Hidekazu MURATA  Susumu YOSHIDA  Kiyomichi ARAKI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    845-852

    We developed two types of practical maximum-likelihood detectors (MLD) for multiple-input multiple-output (MIMO) systems, using a field programmable gate array (FPGA) device. For implementations, we introduced two simplified metrics called a Manhattan metric and a correlation metric. Using the Manhattan metric, the detector needs no multiplication operations, at the cost of a slight performance degradation within 1 dB. Using the correlation metric, the MIMO-MLD can significantly reduce the complexity in both multiplications and additions without any performance degradation. This paper demonstrates the bit-error-rate performance of these MLD prototypes at a 1 Gbps-order real-time processing speed, through the use of an all-digital baseband 44 MIMO testbed integrated on the same FPGA chip.

  • A Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers

    Xizhen XU  Sotirios G. ZIAVRAS  

     
    PAPER-Parallel/Distributed Algorithms

      Vol:
    E89-D No:2
      Page(s):
    639-646

    FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1],[2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping an application algorithm to FPGAs still remains a daunting job in configurable system design. To circumvent these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Pyramidal Instruction Set Architecture (PISA). PISA comprises high-level instructions implemented as FPGA functions of coarse-grain SIMD (Single-Instruction, Multiple-Data) tasks to facilitate ease of program development, code portability across different H-SIMD implementations and high performance. We assume a multi-FPGA board where each FPGA is configured as a separate SIMD machine. Multiple FPGA chips can work in unison at a higher SIMD level, if needed, controlled by the host. Additionally, by using a memory switching scheme and the high-level PISA to partition applications into coarse-grain tasks, host-FPGA communication overheads can be hidden. We enlist the two-dimensional Fast Fourier Transform (2D FFT) to test the effectiveness of H-SIMD. The test results show sustained high performance for this problem. The H-SIMD machine even outperforms a Xeon processor for this problem.

  • Low-Power Field-Programmable VLSI Using Multiple Supply Voltages

    Weisheng CHONG  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Low Power Methodology

      Vol:
    E88-A No:12
      Page(s):
    3298-3305

    A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in field-programmable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages.

  • FPGA Implementation of a Stereo Matching Processor Based on Window-Parallel-and-Pixel-Parallel Architecture

    Masanori HARIYAMA  Yasuhiro KOBAYASHI  Haruka SASAKI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Vol:
    E88-A No:12
      Page(s):
    3516-3522

    This paper presents a processor architecture for high-speed and reliable stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on the regularity of reference pixels. The stereo matching processor is implemented on an FPGA. Its performance is 80 times higher than that of a microprocessor (Pentium4@2 GHz), and is enough to generate a 3-D depth image at the video rate of 33 MHz.

  • Design and Evaluation of Hardware Pseudo-Random Number Generator MT19937

    Shiro KONUMA  Shuichi ICHIKAWA  

     
    LETTER-VLSI Systems

      Vol:
    E88-D No:12
      Page(s):
    2876-2879

    MT19937 is a kind of Mersenne Twister, which is a pseudo-random number generator. This study presents new designs for a MT19937 circuit suitable for custom computing machinery for high-performance scientific simulations. Our designs can generate multiple random numbers per cycle (multi-port design). The estimated throughput of a 52-port design was 262 Gbps, which is 115 times higher than the software on a Pentium 4 (2.53 GHz) processor. Multi-port designs were proven to be more cost-effective than using multiple single-port designs. The initialization circuit can be included without performance loss in exchange for a slight increase of logic scale.

  • Frequency-Scaling Approach for Managing Power Consumption in NOCs

    Chun-Lung HSU  Wen-Tso WANG  Ying-Fu HONG  

     
    LETTER

      Vol:
    E88-A No:12
      Page(s):
    3580-3583

    This work presents a frequency-scaling low-power (FSLP) design methodology for managing power consumption of cores in the tile-based network-on-chip (NOC) architecture. A moving picture experts group (MPEG) core is tested using the field-programmable gate array (FPGA) implementation to verify the feasibility of the proposed method. Measurement results show that about 30% power consumption can be saved in the MPEG core and reveal that the proposed FSLP design method can be suitable for cores in the tile-based NOC applications.

221-240hit(330hit)