1-2hit |
This paper presents unified VLSI architectures which can efficiently realize some widespread one-dimensional (1-D) and two-dimensional (2-D) real discrete trigonometric transforms, including the discrete Hartley transform (DHT), discrete sine transform (DST), and discrete cosine transform (DCT). First, succinct and unrestrictive Clenshaw's recurrence formula along with the inherent symmetry of the trigonometric functions are adequately employed to render efficient recurrences for computing these 1-D RDTT. By utilizing an appropriate row-column decomposition approach, the same set of recurrences can also be used to compute both of the row transform and column transform of the 2-D RDTT. Array architectures, basing on the developed recurrences, are then introduced to implement these 1-D and 2-D RDTT. Both architectures provide substantial hardware savings as compared with previous works. In addition, they are not only applicable to the 1-D and 2-D RDTT of arbitrary size, but they can also be easily adapted to compute all aforementioned RDTT with only minor modifications. A complete set of input/output (I/O) buffers along with a bidirectional circular shift matrix are addressed as well to enable the architectures to operate in a fully-pipelined manner and to rectify the transformed results in a natural order. Moreover, the resulting architectures are both highly regular, modular, and locally-connected, thus being amenable to VLSI implementations.
Yeong-Sheng CHEN Sheng-De WANG Kuo-Chun SU
This paper is concerned with synthesizing VLSI array processors from iterative algorithms. Our primary objective is to obtain the highest processor efficiency but not the shortest completion time. Unlike most of the previous work that assumes the index space of the given iterative algorithm to be boundless, the proposed method takes into account the effects of the boundaries of the index space. Due to this consideration, the pseudo-dependence relations are excluded, and most of the independent computations can therefore be uniformly grouped. With the method described in this paper, the index space is partitioned into equal-size blocks and the corresponding computations are systematically and uniformly mapped into processing elements. The synthesized VLSI array processors possess the attractive feature of very high processor efficiency, which, in general, is superior to what is derived from the conventional linear transformation methods.