1-2hit |
Katsumi HARASHIMA Miki YOSHIDA Hironori KOMI Kunio FUKUNAGA
We propose an optimal throughput problem using graph transformations to maximize throughput of a pipelined data path with some loops. The upper bound of the throughput, equals to the lower bound of the iteration interval between the start of two successive iterations, is limited by the length of a critical loop. Therefore we can maximize the throughput by minimizing the length of the critical loop. The proposed method first schedules an initial Data Flow Graph (DFG) under the initial iteration interval as few as it can use resources, then it transforms the DFG into the flow graph with the minimal length of the critical loop by rescheduling the given initial scheduling result. If there are any control steps which violate the resource constraints owing to the transformations, then these operations are adjusted so as to satisfy given resource consrtraints. Finally by rescheduling the transformed DFG, it gives a schedule with maximum throughput. Experiments show the efficiency of our proposed approach.
Frederico Buchholz MACIEL Yoshikazu MIYANAGA Koji TOCHINAI
The throughput of a parallel execution of a Digital Signal Processing (DSP) algorithm is limited by the iteration bound, which is the minimum period between the start of consecutive iterations. It is given by T=max (Ti/Di), where Ti and Di are the total time of operations and the number of delays in loop i, respectively. A schedule is said rate-optimal if its iteration period is T. The throughput of a DSP algorithm execution can be increased by reducing the Ti's, which can be done by taking as many operations as possible out of loops without changing the semantic of the calculation. This paper presents an optimization technique, called Loop Shrinking, which reduces the iteration bound this way by using commutativity, associativity and distributivity. Also, this paper presents a scheduling method, called Period-Driven Scheduling, which gives rate-optimal schedules more efficiently than existing approaches. An implementation of both is then presented for a system in development by the authors. The system shows reduction in the iteration bound near or equal to careful hand-tunning, and hardware-optimal designs in most of the cases.