IEICE global.ieice.org Site

Keyword Search Result

[Keyword] high level synthesis(8hit)

1-8hit

High Level Congestion Detection from C/C++ Source Code for High Level Synthesis Open Access
Masato TATSUOKA Mineo KANEKO

PAPER

Vol:
E103-A No:12
Page(s):
1437-1446
High level synthesis (HLS) is a source-code-driven Register Transfer Level (RTL) design tool, and the performance, the power consumption, and the area of a generated RTL are limited partly by the description of a HLS input source code. In order to break through such kind of limitation and to get a further optimized RTL, the optimization of the input source code is indispensable. Routing congestion is one of such problems we need to consider the refinement of a HLS input source code. In this paper, we propose a novel HLS flow that performs code improvements by detecting congested parts directly from HLS input source code without using physical logic synthesis, and regenerating the input source code for HLS. In our approach, the origin of the wire congestion is detected from the HLS input source code by applying pattern matching on Program-Dependence Graph (PDG) constructed from the HLS input source code, the possibility of wire congestion is reported.
Fixed Point Data Type Modeling for High Level Synthesis
Benjamin CARRION SCHAFER Yusuke IGUCHI Wataru TAKAHASHI Shingo NAGATANI Kazutoshi WAKABAYASHI

PAPER

Vol:
E93-C No:3
Page(s):
361-368
A methodology to automatically convert fixed point data type representations into integer data types for high level synthesis is presented in this work. Our method converts all major C operations using fixed point data types into integer data types, models quantization and overflow modes, type conversion and casting. The conversion rule for each operation is described in detail as well as a regression test environment with 600 test cases to validate the method and to verify the correctness of each conversion compared to the same cases written in SystemC. The test environment converts each test case with fixed point data types into integer data types and synthesizes them with a high level synthesis tool to generate RTL. An RTL simulation is ran and the results in turn compared to the SystemC's OSCI simulation. For all of the 600 test cases the RTL simulation results matched the SystemC results proving that each conversion is accurately modeled. A larger real test case is also presented to validate the conversion method in a complex case.
Interconnect-Aware Pipeline Synthesis for Array-Based Architectures
Shanghua GAO Hiroaki YOSHIDA Kenshu SETO Satoshi KOMATSU Masahiro FUJITA

PAPER-VLSI Design Technology and CAD

Vol:
E92-A No:6
Page(s):
1464-1475
In the deep-submicron era, interconnect delays are becoming one of the most important factors that can affect performance in the VLSI design. Many state-of-the-art research in high level synthesis try to consider the effect of interconnect delays. These research indeed achieve better performance compared with traditional ones which ignore interconnect delays. When applications contain large loops, however, there is still much room to improve the performance by exploiting the parallelism. In this paper, we, for the first time, propose a method to utilize pipelining techniques and take interconnect delays into account together so as to improve the quality of high level synthesis. The proposed method has the following two characteristics: 1) it separates the consideration of interconnect delay from computation delay, and allows concurrent data transfer and computation; 2) it belongs to modulo scheduling framework, in the sense that all iterations have identical schedules, and are initiated periodically. We evaluate our method from two different points of view. Firstly, we compare our method with an existing interconnect-aware high level synthesis that does not utilize pipelining techniques, and the experimental results show that our method can obtain about 3.4 times performance improvement on average. Secondly, we compare our method with an existing pipeline synthesis that does not consider interconnect delays, and the results show that our method can obtain about 1.5 times performance improvement on average. In addition, we also evaluate our proposed architecture and the experimental results demonstrate that it is better than existing architecture in [1].
Simultaneous Optimization of Skew and Control Step Assignments in RT-Datapath Synthesis
Takayuki OBATA Mineo KANEKO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E91-A No:12
Page(s):
3585-3595
As well as the schedule affects system performance, the control skew, i.e., the arrival time difference of control signals between registers, can be utilized for improving the system performance, enhancing robustness against delay variations, etc. The simultaneous optimization of the control step assignment and the control skew assignment is more powerful technique in improving performance. In this paper, firstly, we prove that, even if the execution sequence of operations which are assigned to the same resource is fixed, the simultaneous optimization problem under a fixed clock period is NP-hard. Secondly, we propose a heuristic algorithm for the simultaneous control step and skew optimization under given clock period, and we show how much the simultaneous optimization improves system performance. This paper is the first one that uses the intentional skew to shorten control steps under a specified clock period. The proposed algorithm has the potential to play a central role in various scenarios of skew-aware high level synthesis.
A Hierarchical Cost Estimation Technique for High Level Synthesis
Mahmoud MERIBOUT Masato MOTOMURA

PAPER-VLSI Design Technology and CAD

Vol:
E86-A No:2
Page(s):
444-461
The aim of this paper is to present a new cost estimation technique to synthesis hardware from high level circuit description. The scheduling and allocation processes are performed in alternative manner, while using realistic cost measurements models that account for Functional Unit (FU), registers, and multiplexers. This is an improvement over previous works, were most of them use very simple cost models that primarily focus on FU resources alone. These latest, however, are not accurate enough to allow effective design space exploration since the effects of storage and interconnect resources can indeed dominates the cost function. We tested our technique on several high-level synthesis benchmarks. The results indicate that the tool can generate near-optimal bus-based and multiplexer-based architectural models with lower number of registers and buses, while presenting high throughput.
Hardware Algorithm Optimization Using Bach C
Kazuhisa OKADA Akihisa YAMADA Takashi KAMBE

PAPER

Vol:
E85-A No:4
Page(s):
835-841
The Bach compiler is a behavioral synthesis tool, which synthesizes RT-level circuits from behavioral descriptions written in the Bach C language. It shortens the design period of LSI and helps designers concentrate on algorithm design and refinement. In this paper, we propose methods for optimizing the area and performance of algorithms described in Bach C. In our experiments, we optimized a Viterbi decoder algorithm using our proposed methods and synthesized the circuit using the Bach compiler. The conclusion is that the circuit produced using Bach is both smaller and faster than the hand-coded register transfer level (RTL) design. This proves that the Bach compiler produces high-quality results and the Bach C language is effective for describing the behavior of hardware at a high-level.
Symbolic Scheduling Techniques
Ivan P. RADIVOJEVI Forrest BREWER

PAPER-High-Level Synthesis

Vol:
E78-D No:3
Page(s):
224-230
This paper describes an exact symbolic formulation of resource-constrained scheduling which allows speculative operation execution in arbitrary forward-branching control/data paths. The technique provides a closed-form solution set in which all satisfying schedules are encapsulated in a compressed OBDD-based representation. An iterative construction method is presented along with benchmark results. The experiments demonstrate the ability of the proposed technique to efficiently extract parallelism not explicitly specified in the input description.
Throughput Optimization by Data Flow Graph Transformation
Katsumi HARASHIMA Miki YOSHIDA Hironori KOMI Kunio FUKUNAGA

LETTER

Vol:
E77-A No:11
Page(s):
1917-1921
We propose an optimal throughput problem using graph transformations to maximize throughput of a pipelined data path with some loops. The upper bound of the throughput, equals to the lower bound of the iteration interval between the start of two successive iterations, is limited by the length of a critical loop. Therefore we can maximize the throughput by minimizing the length of the critical loop. The proposed method first schedules an initial Data Flow Graph (DFG) under the initial iteration interval as few as it can use resources, then it transforms the DFG into the flow graph with the minimal length of the critical loop by rescheduling the given initial scheduling result. If there are any control steps which violate the resource constraints owing to the transformations, then these operations are adjusted so as to satisfy given resource consrtraints. Finally by rescheduling the transformed DFG, it gives a schedule with maximum throughput. Experiments show the efficiency of our proposed approach.