Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Cao LIANG, Xinming HUANG, "Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture" in IEICE TRANSACTIONS on Electronics,
vol. E93-C, no. 3, pp. 407-415, March 2010, doi: 10.1587/transele.E93.C.407.
Abstract: Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.E93.C.407/_p
Copy
@ARTICLE{e93-c_3_407,
author={Cao LIANG, Xinming HUANG, },
journal={IEICE TRANSACTIONS on Electronics},
title={Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture},
year={2010},
volume={E93-C},
number={3},
pages={407-415},
abstract={Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.},
keywords={},
doi={10.1587/transele.E93.C.407},
ISSN={1745-1353},
month={March},}
Copy
TY - JOUR
TI - Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture
T2 - IEICE TRANSACTIONS on Electronics
SP - 407
EP - 415
AU - Cao LIANG
AU - Xinming HUANG
PY - 2010
DO - 10.1587/transele.E93.C.407
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E93-C
IS - 3
JA - IEICE TRANSACTIONS on Electronics
Y1 - March 2010
AB - Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.
ER -