Based on our previous work, this work presents a complete method for time-domain processing of frequency-domain data with evenly-spaced frequency indices, together with its application. The proposed method can be used to calculate the cross spectral and power spectral densities for the frequency indices of interest. A promising application for the time-domain processing of frequency-domain data, particularly for calculating the summation of frequency-domain cross- and auto-correlations in orthogonal frequency-division multiplexing (OFDM) systems, is studied. The advantages of the time-domain processing of frequency-domain data are 1) the ability to rapidly acquire the properties that are readily available in the frequency domain and 2) the reduced complexity. The proposed fast algorithm directly employs time-domain samples, and hence, does not need the fast Fourier transform (FFT) operation. The proposed algorithm has a lower complexity (required complex multiplications ∼ O(N)) than conventional techniques.
Motohiro TANABE Masahiro UMEHIRA
An OFDMA-based (Orthogonal Frequency Division Multiple Access-based) channel access scheme for dynamic spectrum access has the drawbacks of large PAPR (Peak to Average Power Ratio) and large ACI (Adjacent Channel Interference). To solve these problems, a flexible channel access scheme using an overlap FFT filter-bank was proposed based on single carrier modulation for dynamic spectrum access. In order to apply the overlap FFT filter-bank for dynamic spectrum access, it is necessary to clarify the performance of the overlap FFT filter-bank according to the design parameters since its frequency characteristics are critical for dynamic spectrum access applications. This paper analyzes the overlap FFT filter-bank and evaluates its performance such as frequency characteristics and ACI performance according to the design parameters.
Shingo YOSHIZAWA Yoshikazu MIYANAGA
We present area- and power-efficient pipeline 128- and 128/64-point fast Fourier transform (FFT) processors for 8x8 multiple-input multiple-output orthogonal frequency multiplexing (MIMO-OFDM) systems based on the specification framework of IEEE 802.11ac WLANs. Our new FFT processors use mixed-radix multipath delay commutator (MRMDC) architecture from the point of view of low complexity and high memory use. A conventional MRMDC architecture induces large circuits in delay commutators, which change the order of data sequences for the butterfly units. The proposed architecture replaces delay elements with new commutators that cooperate with other MIMO-OFDM processing blocks. These commutators are inserted in the front and rear of the input and output memory units. Our FFT processors exhibit a 50–51% reduction in logic gates and 70–72% reduction in power dissipation as compared with conventional ones.
Fahad QURESHI Oscar GUSTAFSSON
In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.
This paper presents a low-complexity multi-mode fast Fourier transform (FFT) processor for Digital Video Broadcasting-Terrestrial 2 (DVB-T2) systems. DVB-T2 operations need 1K/2K/4K/8K/16K/32K-point multiple mode FFT processors. The proposed architecture employs pipelined shared-memory architecture in which radix-2/22/23/24 FFT algorithms, multi-path delay commutator (MDC), and a novel data scaling approach are exploited. Based on this architecture, a novel low-cost data scaling unit is proposed to increase area efficiency, and an elaborate memory configuration scheme is designed to make single-port SRAM without degrading throughput rate. Also, new scheduling method of twiddle factor is proposed to reduce the area. The SQNR performance of 32K-point FFT mode is about 45.3 dB at 11-bit internal word length for 256QAM modulation. The proposed FFT processor has a lower hardware complexity and memory size compared to conventional FFT processors.
Chin-Long WEY Shin-Yo LIN Pei-Yun TSAI Ming-Der SHIEH
Multi-core processors have been attracting a great deal of attention. In the domain of signal processing for communications, the current trends toward rapidly evolving standards and formats, and toward algorithms adaptive to dynamic factors in the environment, require programmable solutions that possess both algorithm flexibility and low implementation complexity. Reconfigurable architectures have demonstrated better tradeoffs between algorithm flexibility, implementation complexity, and energy efficiency. This paper presents a reconfigurable homogeneous memory-based FFT processor (MBFFT) architecture integrated in a single chip to provide hybrid SISO/MIMO OFDM wireless communication systems. For example, a reconfigurable MBFFT processor with eight processing elements (PEs) can be configured for one DVB-T/H with N=8192 and two 802.11n with N=128. The reconfigurable processors can perfectly fit the applications of Software Defined Radio (SDR) which requires more hardware flexibility.
Amedeo CAPOZZOLI Claudio CURCIO Antonio DI VICO Angelo LISENO
We develop an effective algorithm, based on the filtered backprojection (FBP) approach, for the imaging of vegetation. Under the FBP scheme, the reconstruction amounts at a non-trivial Fourier inversion, since the data are Fourier samples arranged on a non-Cartesian grid. The computational issue is efficiently tackled by Non-Uniform Fast Fourier Transforms (NUFFTs), whose complexity grows asymptotically as that of a standard FFT. Furthermore, significant speed-ups, as compared to fast CPU implementations, are obtained by a parallel versions of the NUFFT algorithm, purposely designed to be run on Graphic Processing Units (GPUs) by using the CUDA language. The performance of the parallel algorithm has been assessed in comparison to a CPU-multicore accelerated, Matlab implementation of the same routine, to other CPU-multicore accelerated implementations based on standard FFT and employing linear, cubic, spline and sinc interpolations and to a different, parallel algorithm exploiting a parallel linear interpolation stage. The proposed approach has resulted the most computationally convenient. Furthermore, an indoor, polarimetric experimental setup is developed, capable to isolate and introduce, one at a time, different non-idealities of a real acquisition, as the sources (wind, rain) of temporal decorrelation. Experimental far-field polarimetric measurements on a thuja plicata (western redcedar) tree point out the performance of the set up algorithm, its robustness against data truncation and temporal decorrelation as well as the possibility of discriminating scatterers with different features within the investigated scene.
In this paper, we propose a memory-efficient structure for a pulse Doppler radar in order to reduce the hardware's complexity. The conventional pulse Doppler radar is computed by fast frequency transform (FFT) of all range cells in order to extract the velocity of targets. We observed that this method requires a huge amount of memory to perform the FFT processes for all of the range cells. Therefore, instead of detecting the velocity of all range cells, the proposed architecture extracts the velocity of the targets by using the cells related to the moving targets. According to our simulations and experiments, the detection performance of this proposed architecture is 93.5%, and the proposed structure can reduce the hardware's complexity by up to 66.2% compared with the conventional structure.
Ryosuke NAKAMOTO Sakae SAKURABA Alexandre MARTINS Takeshi ONOMI Shigeo SATO Koji NAKAJIMA
We have designed and implemented a 4-bit Carry Look-ahead Adder (CLA) and 4-bit parallel multipliers to be used for the Fast Fourier Transform (FFT) system with the estimated clock frequency of 20 GHz. Through some high frequency functional tests, we have confirmed that the operation of the CLA has been successful. Through some low speed tests, we have also confirmed that the operation of multiplication has been successful. In addition, we have designed a 4-bit multiplier with a Booth encoder and with a 2-point-4-bit butterfly circuit.
Chin-Long WEY Shin-Yo LIN Hsu-Sheng WANG Hung-Lieh CHEN Chun-Ming HUANG
In UWB systems, data symbols are transmitted and received continuously. The Fast Fourier Transform (FFT) processor must be able to seamlessly process input/output data. This paper presents the design and implementation of a continuous data flow parallel memory-based FFT (CF-PMBFFT) processor without the use of input buffer for pre-loading the input data. The processor realizes a memory space of two N-words and multiple processing elements (PEs) to achieve the seamless data flow and meet the design requirement. The circuit has been fabricated in TSMC 0.18 µm 1P6M CMOS process with the supply voltage of 1.8 V. Measurement results of the test chip shows that the developed CF-PMBFFT processor takes a core area of 1.97 mm2 with a power consumption of 62.12 mW for a throughput rate of 528 MS/s.
Kohsuke HARADA Haruka OBATA Hironori UCHIKAWA Kenji YOSHIDA Yuji SAKAI
In this paper, we consider the behavior of an autoregressive (AR) detector for partial-response (PR) signaling against offtrack interference (OTI) environment in perpendicular magnetic recording. Based on the behavior, we derive the optimum branch metric to construct the detector by the Viterbi algorithm. We propose an optimum AR detector for OTI that considers an optimum branch metric calculation and an estimation of noise power due to OTI in order to calculate an accurate branch metric. To evaluate the reliability of soft-output likelihood values calculated by our proposed AR detector, we demonstrate a bit error rate performance (BER) of low-density parity-check (LDPC) codes under OTI existing channel by computer simulation. Our simulation results show the proposed AR detector can achieve a better LDPC-coded BER performance than the conventional AR detector. We also show the BER performance of our proposal can keep within 0.5 dB of the case that perfect channel state information regarding OTI is used in the detector. In addition, we show that the partial-response maximum-likelihood (PRML) detector is robust against OTI even if OTI is not handled by the detector.
Zhen LI Atushi UEMURA Hitoshi KIYA
An FFT-based full-search block matching algorithm (BMA) is described that uses the sum of squared differences (SSD) criterion. The proposed method does not have to extend a real signal into complex one. This reduces the computational load of FFT approaches. In addition, if two macroblocks share the same search window, they can be matched at the same time. In a simulation of motion estimation, the proposed method achieved the same performance as a direct SSD full search and its processing speed is faster than other FFT-based BMAs.
Shuang ZHAO Wenqing LU Xiaofang ZHOU Dian ZHOU Gerald E. SOBELMAN
MIMO-OFDM systems aim to improve transmission quality and/or throughput but require significant signal processing capability and flexibility at reasonable cost. This paper proposes a reconfigurable architecture and associated algorithm optimizations for these types of systems based on the IEEE 802.11n and IEEE 802.16e standards. In particular, we describe the implementation of two key computations onto this architecture, namely Fast Fourier Transform (FFT) and Space-Time Block Decoding (STBD). The design is post-layout using a UMC 0.18 micron technology at a clock rate of 100 MHz. Performance comparisons with other optimization methods and hardware implementations are given.
Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.
This paper proposes a novel robust audio watermarking algorithm to embed data and extract it in a bit-exact manner based on changing the magnitudes of the FFT spectrum. The key point is selecting a frequency band for embedding based on the comparison between the original and the MP3 compressed/decompressed signal and on a suitable scaling factor. The experimental results show that the method has a very high capacity (about 5 kbps), without significant perceptual distortion (ODG about -0.25) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3). Furthermore, the proposed method has a larger capacity (number of embedded bits to number of host bits rate) than recent image data hiding methods.
Motohiro TANABE Masahiro UMEHIRA Koichi ISHIHARA Yasushi TAKATORI
An OFDMA based channel access scheme is proposed for dynamic spectrum access to utilize frequency spectrum efficiently. Though the OFDMA based scheme is flexible enough to change the bandwidth and channel of the transmitted signals, the OFDMA signal has large PAPR (Peak to Average Power Ratio). In addition, if the OFDMA receiver does not use a filter to extract sub-carriers before FFT (Fast Fourier Transform) processing, the designated sub-carriers suffer large interference from the adjacent channel signals in the FFT processing on the receiving side. To solve the problems such as PAPR and adjacent channel interference encountered in the OFDMA based scheme, this paper proposes a novel dynamic channel access scheme using overlap FFT filter-bank based on single carrier modulation. It also shows performance evaluation results of the proposed scheme by computer simulation.
Tomoya TANDAI Takahiro KOBAYASHI
In this paper, a sidelobe suppression technique for orthogonal frequency division multiplexing (OFDM)-based cognitive radios (CR) is proposed. In the OFDM-based CR systems, after the CR terminal executes spectrum sensing, it transmits a CR packet by activating the subcarriers in the frequency bands where no signals are detected (hereinafter, these subcarriers are called "active subcarrier") and by disabling (nulling) the subcarriers in the frequency bands where the signals are detected. In this situation, a problem arises in that the signals that leak from the active subcarriers to the null subcarriers may interfere with the primary systems. Therefore, this signal leakage has to be minimized. In many OFDM-based wireless communication systems, one packet or frame consists of multiple OFDM symbols and the discontinuity between the consecutive OFDM symbols causes the signal leakage to the null subcarriers. In the proposed method, signal leakage to the null subcarriers is suppressed by regenerating null subcarriers in the frequency-domain signal of the whole packet as follows. One CR packet consisting of multiple OFDM symbols having null subcarriers and guard interval (GI) is buffered and oversampled, and then the oversampled signal is Fourier transformed at once and consequently the frequency-domain signal of the packet is obtained. The null subcarriers in the frequency-domain signal are zeroed again, and then the signal is inverse Fourier transformed and transmitted. The proposed method significantly suppresses the signal leakage. The spectral power density, the peak-to-average power ratio (PAPR) and the packet error rate (PER) performances of the proposed method are evaluated by computer simulations and the effectiveness of the proposed method is shown.
Jae-Seong LEE Chang-Joon LEE Young-Cheol PARK Dae-Hee YOUN
This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Kiyoshi KOBAYASHI Fumihiro YAMASHITA Jun-ichi ABE Masazumi UEBA
This paper presents a prototype group modem for a hyper-multipoint data gathering satellite communication system. It can handle arbitrarily and dynamically assigned FDMA signals by employing a novel FFT-type block demultiplexer/multiplexer. We clarify its configuration and operational principle. Experiments show that the developed modem offers excellent performance.
Lilin DAN Yue XIAO Wei NI Shaoqian LI
In this letter, a low complexity transmitter is proposed for the downlinks of orthogonal frequency code division multiplexing (OFCDM) systems. The principle is based on a joint time-frequency spreading and inverse fast Fourier transform (TFS-IFFT), which combines the frequency spreading with partial stages of IFFT, so as to simplify the real-time processing. Compared with the conventional one, the proposed OFCDM transmitter is of lower real-time computational complexity, especially for those with large spreading factor or low modulation level. Furthermore, the proposed TFS-IFFT can also be applied to other frequency spreading systems, such as MC-CDMA, for complexity reduction.