The search functionality is under construction.

Keyword Search Result

[Keyword] interconnection delay(9hit)

1-9hit
  • A Bitwidth-Aware High-Level Synthesis Algorithm Using Operation Chainings for Tiled-DR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2911-2924

    As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and -controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% compared to the conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.

  • A Floorplan Aware High-Level Synthesis Algorithm with Body Biasing for Delay Variation Compensation

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1439-1451

    In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.

  • A Multi-Scenario High-Level Synthesis Algorithm for Variation-Tolerant Floorplan-Driven Design

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1278-1293

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

  • Interconnection-Delay and Clock-Skew Estimate Modelings for Floorplan-Driven High-Level Synthesis Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1294-1310

    Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.

  • A High-Level Synthesis Algorithm with Inter-Island Distance Based Operation Chainings for RDR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1366-1375

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.

  • A Floorplan-Driven High-Level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Shin-ya ABE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1392-1405

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.

  • An Energy-Efficient Floorplan Driven High-Level Synthesis Algorithm for Multiple Clock Domains Design

    Shin-ya ABE  Youhua SHI  Kimiyoshi USAMI  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1376-1391

    In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.

  • Floorplan Driven Architecture and High-Level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Shin-ya ABE  Youhua SHI  Kimiyoshi USAMI  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2597-2611

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.

  • Prospects of Multiple-Valued VLSI Processors

    Takahiro HANYU  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    INVITED PAPER

      Vol:
    E76-C No:3
      Page(s):
    383-392

    Rapid advances in integrated circuit technology based on binary logic have made possible the fabrication of digital circuits or digital VLSI systems with not only a very large number of devices on a single chip or wafer, but also high-speed processing capability. However, the advance of processing speeds and improvement in cost/performance ratio based on conventional binary logic will not always continue unabated in submicron geometry. Submicron integrated circuits can handle multiple-valued signals at high speed rather than binary signals, especially at data communication level because of the reduced interconnections. The use of nonbinary logic or discrete-analog signal processing will not be out of the question if the multiple-valued hardware algorithms are developed for fast parallel operations. Moreover, in VLSI or ULSI processors the delay time due to global communications between functional modules or chips instead of each functional module itself is the most important factors to determine the total performance. Locally computable hardware implementation and new parallel hardware algorithms natural to multiple-valued data representation and circuit technologies are the key properties to develop VLSI processors in submicron geometry. As a result, multiple-valued VLSI processors make it possible to improve the effective chip density together with the processing speed significantly. In this paper, we summarize several potential advantages of multiple-valued VLSI processors in submicron geometry due to great reduction of interconnection and due to the suitability to locally computable hardware implementation, and demonstrate that some examples of special-purpose multiple-valued VLSI processors, which are a signed-digit arithmetic VLSI processor, a residue arithmetic VLSI processor and a matching VLSI processor can achieve higher performance for real-world computing system.