The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] HLS(5hit)

1-5hit
  • RT-libSGM: FPGA-Oriented Real-Time Stereo Matching System with High Scalability

    Kaijie WEI  Yuki KUNO  Masatoshi ARAI  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2022/12/07
      Vol:
    E106-D No:3
      Page(s):
    337-348

    Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-Programmable Gate Array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the benchmark design (libSGM) working on the Tegra X2 GPU, RT-libSGM runs more than 2× faster at a much lower energy cost.

  • A Ruby-Based Hardware/Software Co-Design Environment with Functional Reactive Programming: Mulvery

    Daichi TERUYA  Hironori NAKAJO  

     
    PAPER-Computer System

      Pubricized:
    2020/05/22
      Vol:
    E103-D No:9
      Page(s):
    1929-1938

    Computation methods using custom circuits are frequently employed to improve the throughput and power efficiency of computing systems. Hardware development, however, can incur significant development costs because designs at the register-transfer level (RTL) with a hardware description language (HDL) are time-consuming. This paper proposes a hardware and software co-design environment, named Mulvery, which is designed for non-professional hardware designer We focus on the similarities between functional reactive programming (FRP) and dataflow in computation. This study provides an idea to design hardware with a dynamic typing language, such as Ruby, using FRP and provides the proof-of-concept of the method. Mulvery, which is a hardware and software co-design tool based on our method, reduces development costs. Mulvery exhibited high performance compared with software processing techniques not equipped with hardware knowledge. According to the experiment, the method allows us to design hardware without degradation of performance. The sample application applied a Laplacian filter to an image with a size of 128×128 and processed a convolution operation within one clock.

  • VHDL vs. SystemC: Design of Highly Parameterizable Artificial Neural Networks

    David ALEDO  Benjamin CARRION SCHAFER  Félix MORENO  

     
    PAPER-Computer System

      Pubricized:
    2018/11/29
      Vol:
    E102-D No:3
      Page(s):
    512-521

    This paper describes the advantages and disadvantages observed when describing complex parameterizable Artificial Neural Networks (ANNs) at the behavioral level using SystemC and at the Register Transfer Level (RTL) using VHDL. ANNs are complex to parameterize because they have a configurable number of layers, and each one of them has a unique configuration. This kind of structure makes ANNs, a priori, challenging to parameterize using Hardware Description Languages (HDL). Thus, it seems intuitively that ANNs would benefit from the raise in level of abstraction from RTL to behavioral level. This paper presents the results of implementing an ANN using both levels of abstractions. Results surprisingly show that VHDL leads to better results and allows a much higher degree of parameterization than SystemC. The implementation of these parameterizable ANNs are made open source and are freely available online. Finally, at the end of the paper we make some recommendation for future HLS tools to improve their parameterization capabilities.

  • Interconnection-Delay and Clock-Skew Estimate Modelings for Floorplan-Driven High-Level Synthesis Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1294-1310

    Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.

  • A Floorplan-Driven High-Level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Shin-ya ABE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1392-1405

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.