The search functionality is under construction.

Keyword Search Result

[Keyword] FPGAs(16hit)

1-16hit
  • A Design Method for Designing Asynchronous Circuits on Commercial FPGAs Using Placement Constraints

    Tatsuki OTAKE  Hiroshi SAITO  

     
    PAPER

      Vol:
    E103-A No:12
      Page(s):
    1427-1436

    In this paper, we propose a design method to design asynchronous circuits with bundled-data implementation on commercial Field Programmable Gate Arrays using placement constraints. The proposed method uses two types of placement constraints to reduce the number of delay adjustments to fix timing violations and to improve the performance of the bundled-data implementation. We also propose a floorplan algorithm to reduce the control-path delays specific to the bundled-data implementation. Using the proposed method, we could design the asynchronous circuits whose performance is close to and energy consumption is small compared to the synchronous counterparts with less delay adjustment.

  • Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access

    Antoniette MONDIGO  Tomohiro UENO  Kentaro SANO  Hiroyuki TAKIZAWA  

     
    PAPER-Applications

      Pubricized:
    2019/02/05
      Vol:
    E102-D No:5
      Page(s):
    1029-1036

    Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform.

  • A Fast MER Enumeration Algorithm for Online Task Placement on Reconfigurable FPGAs

    Tieyuan PAN  Lian ZENG  Yasuhiro TAKASHIMA  Takahiro WATANABE  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2412-2424

    In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

  • TherWare: Thermal-Aware Placement and Routing Framework for 3D FPGAs with Location-Based Heat Balance

    Ya-Shih HUANG  Han-Yuan CHANG  Juinn-Dar HUANG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:8
      Page(s):
    1796-1805

    The emerging three-dimensional (3D) technology is considered as a promising solution for achieving better performance and easier heterogeneous integration. However, the thermal issue becomes exacerbated primarily due to larger power density and longer heat dissipation paths. The thermal issue would also be critical once FPGAs step into the 3D arena. In this article, we first construct a fine-grained thermal resistive model for 3D FPGAs. We show that merely reducing the total power consumption and/or minimizing the power density in vertical direction is not enough for a thermal-aware 3D FPGA backend (placement and routing) flow. Then, we propose our thermal-aware backend flow named TherWare considering location-based heat balance. In the placement stage, TherWare not only considers power distribution of logic tiles in both lateral and vertical directions but also minimizes the interconnect power. In the routing stage, TherWare concentrates on overall power minimization and evenness of power distribution at the same time. Experimental results show that TherWare can significantly reduce the maximum temperature, the maximum temperature gradient, and the temperature deviation only at the cost of a minor increase in delay and runtime as compared with present arts.

  • High-Speed Fully-Adaptable CRC Accelerators

    Amila AKAGIC  Hideharu AMANO  

     
    PAPER-Computer System

      Vol:
    E96-D No:6
      Page(s):
    1299-1308

    Cyclic Redundancy Check (CRC) is a well known error detection scheme used to detect corruption of digital content in digital networks and storage devices. Since it is a compute-intensive process which adversely affects performance, hardware acceleration using FPGAs has been tried and satisfactory performance has been achieved. However, recent extended usage of networks and storage systems require various correction capabilities for various CRC standards. Traditional hardware designs based on the LFSR (Linear Feedback Shift Register) tend to have fixed structure without such flexibility. Here, fully-adaptable CRC accelerator based on a table-based algorithm is proposed. The table-based algorithm is a flexible method commonly used in software implementations. It has been rarely implemented with the hardware, since it is believed that the operational speed is not enough. However, by using pipelined structure and efficient use of memory modules in FPGAs, it appeared that the table-based fixed CRC accelerators achieved better performance than traditional implementation. Based on the implementation, fully-adaptable CRC accelerator which eliminate the need for many non-adaptable CRC implementations is proposed. The accelerator has ability to process arbitrary number of input data and generates CRC for any known CRC standard, up to 65 bits of generator polynomial, during run-time. Further, we modify Table generation algorithm in order to decrease its space complexity from O(nm) to O(n). On Xilinx Virtex 6 LX550T board, the fully-adaptable accelerators occupy between 1 to 2% area to produce maximum of 289.8 Gbps at 283.1 MHz if BRAM is deployed, or between 1.6 - 14% of area for 418 Gbps at 408.9 MHz if tables are implemented in logic. Proposed architecture enables further expansion of throughput by increasing a number of input bits M processed at a time.

  • A Domain Partition Model Approach to the Online Fault Recovery of FPGA-Based Reconfigurable Systems

    Lihong SHANG  Mi ZHOU  Yu HU  Erfu YANG  

     
    PAPER-Nonlinear Problems

      Vol:
    E94-A No:1
      Page(s):
    290-299

    Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.

  • Evaluation of a Field-Programmable VLSI Based on an Asynchronous Bit-Serial Architecture

    Masanori HARIYAMA  Shota ISHIHARA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E91-C No:9
      Page(s):
    1419-1426

    This paper presents a novel asynchronous architecture of Field-programmable gate arrays (FPGAs) to reduce the power consumption. In the dynamic power consumption of the conventional FPGAs, the power consumed by the switch blocks and clock distribution is dominant since FPGAs have complex switch blocks and the large number of registers for high programmability. To reduce the power consumption of switch blocks and clock distribution, asynchronous bit-serial architecture is proposed. To ensure the correct operation independent of data-path lengths, we use the level-encoded dual-rail encoding and propose its area-efficient implementation. The proposed field-programmable VLSI is implemented in a 90 nm CMOS technology. The delay and the power consumption of the proposed FPVLSI are respectively 61% and 58% of those of 4-phase dual-rail encoding which is the most common encoding in delay insensitive encoding.

  • An Efficient and Effective Algorithm for Online Task Placement with I/O Communications in Partially Reconfigurable FPGAs

    Mitsuru TOMONO  Masaki NAKANISHI  Shigeru YAMASHITA  Kazuo NAKAJIMA  Katsumasa WATANABE  

     
    PAPER-System Level Design

      Vol:
    E89-A No:12
      Page(s):
    3416-3426

    In a partially reconfigurable FPGA of the future, arbitrary portions of its logic resources and interconnection networks will be reconfigured without affecting the other parts. Multiple tasks will be mapped and executed concurrently in such an FPGA. Efficient execution of the tasks using the limited resources of the FPGA will necessitate effective resource management. A number of online FPGA placement methods have recently been proposed for such an FPGA. However, they cannot handle I/O communications of the tasks. Taking such I/O communications into consideration, we introduce a new approach to online FPGA placement. We present an algorithm for placing each arriving task in an empty area so as to complete all the tasks efficiently. We develop two fitting strategies to effectively handle I/O communications of the tasks. Our experimental results show that properly weighted combinations of these and two other previously proposed strategies enable this algorithm to run very fast and make an effective placement of the tasks. In fact, we show that the overhead associated with the use of this algorithm is negligible as compared to the total execution time of the tasks.

  • Design of a Field-Programmable Digital Filter Chip Using Multiple-Valued Current-Mode Logic

    Katsuhiko DEGAWA  Takafumi AOKI  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E86-A No:8
      Page(s):
    2001-2010

    This paper presents a Field-Programmable Digital Filter (FPDF) IC that employs carry-propagation-free redundant arithmetic algorithms for faster computation and multiple-valued current-mode circuit technology for high-density low-power implementation. The original contribution of this paper is to evaluate, through actual chip fabrication, the potential impact of multiple-valued current-mode circuit technology on the reduction of hardware complexity required for DSP-oriented programmable ICs. The prototype FPDF fabrication with 0.6 µm CMOS technology demonstrates that the chip area and power consumption can be reduced to 41% and 71%, respectively, compared with the standard binary logic implementation.

  • Accelerating the CKY Parsing Using FPGAs

    Jacir L. BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    803-810

    The main contribution of this paper is to present an FPGA-based implementation of an instance-specific hardware which accelerates the CKY (Cocke-Kasami-Younger) parsing for context-free grammars. Given a context-free grammar G and a string x, the CKY parsing determines whether G derives x. We have developed a hardware generator that creates a Verilog HDL source to perform the CKY parsing for any given context-free grammar G. The generated source is embedded in an FPGA using the design software provided by the FPGA vendor. We evaluated the instance-specific hardware, generated by our hardware generator, using a timing analyzer and tested it using the Altera FPGAs. The generated hardware attains a speed-up factor of approximately 750 over the software CKY parsing algorithm.

  • Functional Decomposition with Application to LUT-Based FPGA Synthesis

    Jian QIAO  Kunihiro ASADA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E84-A No:8
      Page(s):
    2004-2013

    In this paper, we deal with the problem of compatibility class encoding, and propose a novel algorithm for finding a good functional decomposition with application to LUT-based FPGA synthesis. Based on exploration of the design space, we concentrate on extracting a set of components, which can be merged into the minimum number of multiple-output CLBs or LUTs, such that the decomposition constructed from these components is also minimal. In particular, to explore more degrees of freedom, we introduce pliable encoding to take over the conventional rigid encoding when it fails to find a satisfactory decomposition by rigid encoding. Experimental results on a large set of MCNC91 logic synthesis benchmarks show that our method is quite promising.

  • A Technique for Modelling Dynamic Reconfiguration with Improved Simulation Accuracy

    Milan VASILKO  David CABANIS  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2465-2474

    This paper presents a new approach to simulation of Dynamically Reconfigurable Logic (DRL) systems, which offers better accuracy of modelling dynamic reconfiguration than previously reported techniques. Our method, named Clock Morphing (CM), is based on modelling dynamic reconfiguration via a reconfigured module clock signal, while using a dedicated signal value to indicate dynamic reconfiguration. We discuss problems associated with the other approaches to DRL simulation and describe the main principles behind the proposed technique. We further demonstrate feasibility of a CM DRL simulation on its example implementation in VHDL.

  • Signed-Weight Arithmetic and Its Application to a Field-Programmable Digital Filter Architecture

    Takafumi AOKI  Yoshiki SAWADA  Tatsuo HIGUCHI  

     
    PAPER-Configurable Computing and Fault Tolerance

      Vol:
    E82-C No:9
      Page(s):
    1687-1698

    This paper presents a new number representation called the Signed-Weight (SW) number system, which is useful for designing configurable counter-tree architectures for digital signal processing applications. The SW number system allows the unified manipulation of positive and negative numbers in arithmetic circuits by adjusting the signs assigned to individual digit positions. This makes possible the construction of highly regular arithmetic circuits without introducing irregular arithmetic operations, such as negation and sign extension in the two's complement representation. This paper also presents the design of a Field-Programmable Digital Filter (FPDF) architecture--a special-purpose FPGA architecture for high-speed FIR filtering--using the proposed SW arithmetic system.

  • Reduction of the Number of FPGA Blocks by Maximizing Flexibility of Internal Functions

    Takenori KOUDA  Shigeru YAMASHITA  Yahiko KAMBAYASHI  

     
    PAPER-Logic Synthesis

      Vol:
    E81-A No:12
      Page(s):
    2554-2562

    In this paper, we will discuss circuit minimization techniques based on the multiple output capability of FPGA blocks. Since previous methods only consider two independent output functions, we will discuss a more complicated case when the two functions are mutually related. We also discuss a method to maximize flexibility of a specified cell output in the given FPGA block. If a set of possible functions for a cell which will not change the FPGA output function is large, we call that the flexibility of this cell is high. The concept of Sets of Pairs of Functions to be Distinguished (SPFDs) introduced by Yamashita et al. is a powerful tool to minimize a given FPGA circuits. In this paper, an extension of the concept, Priority based SPFDs (PSPFDs) is introduced to maximize the flexibility of output functions realized by such internal cells. By using PSPFDs for our new method, we can utilize the multiple output capability very well. Combination with the previous methods with PSPFDs is also shown to be important. We have implemented these methods and applied them to MCNC benchmarks mapped into 5-variable function blocks. To make a comparison with other methods, we have implemented methods using well-known merging algorithms utilizing the same multiple output capability. Experimental results show that our methods can reduce the number of blocks in the initial circuits by 40% on average. This reduction ratio is 16% higher than that of previous methods.

  • Plastic Cell Architecture: A Scalable Device Architecture for General-Purpose Reconfigurable Computing

    Kouichi NAGAMI  Kiyoshi OGURI  Tsunemichi SHIOZAWA  Hideyuki ITO  Ryusuke KONISHI  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1431-1437

    We propose an architectural reference of programmable devices that we call Plastic Cell Architecture (PCA). PCA is a reference for implementing a device with autonomous reconfigurability, which we also introduce in this paper. This reconfigurability is a further step toward new reconfigurable computing, which introduces variable- and programmable-grained parallelism to wired logic computing. This computing follows the Object-Oriented paradigm: it regards configured circuits as objects. These objects will be described in a new hardware description language dealing with the semantics of dynamic module instantiation. PCA is the fusion of SRAM-based FPGAs and cellular automata (CA), where the CA are dedicated to support run time activities of objects. This paper mainly focus on autonomous reconfigurability and PCA. The following discussions examine a research direction towards general-purpose reconfigurable computing.

  • Prospects for Multiple-Valued Integrated Circuits

    Kenneth Carless SMITH  P.Glenn GULAK  

     
    INVITED PAPER

      Vol:
    E76-C No:3
      Page(s):
    372-382

    The evolution of Multiple-Valued Logic (MVL) circuits has been inexorably tied to the rapid technological changes induced by evolving needs and emerging developments in computing methodologies. Unfortunately for MVL, the numbers of designers of technologies and circuits whose lives are dedicated to the improvement of binary techniques, are large and overwhelming. Correspondingly, technological developments in MVL typically await the appearance of a problem or technique in the larger binary world to motivate and/or make possible some new advance. Such opportunities are inevitably quite transient since each such problem is simultaneously attacked by many others of a more conventional bent, and, as well, each technological change begets yet another, quickly. It is in the sensing of this reality that the present paper is written. Correspondingly, its thrust is two-fold: One target is the possibility of encouraging a leap ahead through modest technological projection. The other is the possibility of identifying application areas that already exist in this unbalanced competition, but which are specially suited to multiple-valued solutions. For example, it has been clear for decades that one such area is that of arithmetic. Correspondingly, we in MVL must strive quickly to concentrate our efforts on applications that exploit such demonstrable strengths. Some such applications are includes here; others are visible historically, many probably remain to be found: Search on!