The search functionality is under construction.

Keyword Search Result

[Keyword] multi-FPGA system(4hit)

1-4hit
  • A Multi-FPGA Implementation of FM-Index Based Genomic Pattern Search

    Ullah IMDAD  Akram BEN AHMED  Kazuei HIRONAKA  Kensuke IIZUKA  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2023/08/09
      Vol:
    E106-D No:11
      Page(s):
    1783-1795

    FPGA clusters that consist of multiple FPGA boards have been gaining interest in recent times. Massively parallel processing with a stand-alone heterogeneous FPGA cluster with SoC- style FPGAs and mid-scale FPGAs is promising with cost-performance benefit. Here, we propose such a heterogeneous FPGA cluster with FiC and M-KUBOS cluster. FiC consists of multiple boards, mounting middle scale Xilinx's FPGAs and DRAMs, which are tightly coupled with high-speed serial links. In addition, M-KUBOS boards are connected to FiC for ensuring high IO data transfer bandwidth. As an example of massively parallel processing, here we implement genomic pattern search. Next-generation sequencing (NGS) technology has revolutionized biological system related research by its high-speed, scalable and massive throughput. To analyze the genomic data, short read mapping technique is used where short Deoxyribonucleic acid (DNA) sequences are mapped relative to a known reference sequence. Although several pattern matching techniques are available, FM-index based pattern search is perfectly suitable for this task due to the fastest mapping from known indices. Since matching can be done in parallel for different data, the massively parallel computing which distributes data, executes in parallel and gathers the results can be applied. We also implement a data compression method where about 10 times reduction in data size is achieved. We found that a M-KUBOS board matches four FiC boards, and a system with six M-KUBOS boards and 24 FiC boards achieved 30 times faster than the software based implementation.

  • The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System

    Tomoki SHIMIZU  Kohei ITO  Kensuke IIZUKA  Kazuei HIRONAKA  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2022/06/30
      Vol:
    E105-D No:12
      Page(s):
    2008-2018

    The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.

  • Inter-FPGA Routing for Partially Time-Multiplexing Inter-FPGA Signals on Multi-FPGA Systems with Various Topologies

    Masato INAGI  Yuichi NAKAMURA  Yasuhiro TAKASHIMA  Shin'ichi WAKABAYASHI  

     
    PAPER-Physical Level Design

      Vol:
    E98-A No:12
      Page(s):
    2572-2583

    Multi-FPGA systems, which consist of multiple FPGAs and a printed circuit board connecting them, are useful and important tools for prototyping large scale circuits, including SoCs. In this paper, we propose a method for optimizing inter-FPGA signal transmission to accelerate the system frequency of multi-FPGA prototyping systems and shorten prototyping time. Compared with the number of I/O pins of an FPGA, the number of I/O signals between FPGAs usually becomes very large. Thus, time-multiplexed I/Os are used to resolve the problem. On the other hand, they introduce large delays to inter-FPGA I/O signals, and much lower the system frequency. To reduce the degradation of the system frequency, we have proposed a method for optimally selecting signals to be time-multiplexed and signals not to be time-multiplexed. However, this method assumes that there exist physical connections (i.e., wires on the printed circuit board) between every pair of FPGAs, and cannot handle I/O signals between a pair of FPGAs that have no physical connections between them. Thus, in this paper, we propose a method for obtaining indirect inter-FPGA routes for such I/O signals, and then combine the indirect routing method and the time-multiplexed signal selection method to realize effective time-multiplexing of inter-FPGA I/O signals on systems with various topologies.

  • A Circuit Partitioning Algorithm with Path Delay Constraints for Multi-FPGA Systems

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E80-A No:3
      Page(s):
    494-505

    In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional alogorithms.