Radomir S. STANKOVIĆ Milena STANKOVIĆ Claudio MORAGA Jaakko T. ASTOLA
Binary bent functions have a strictly specified number of non-zero values. In the same way, ternary bent functions satisfy certain requirements on the elements of their value vectors. These requirements can be used to specify six classes of ternary bent functions. Classes are mutually related by encoding of function values. Given a basic ternary bent function, other functions in the same class can be constructed by permutation matrices having a block structure similar to that of the factor matrices appearing in the Good-Thomas decomposition of Cooley-Tukey Fast Fourier transform and related algorithms.
Zhenyu WEI Wei WANG Ben WANG Ping LIU Linshu GONG
Sparse arrays can usually achieve larger array apertures than uniform linear arrays (ULA) with the same number of physical antennas. However, the conventional direction-of-arrival (DOA) estimation algorithms for sparse arrays usually require the spatial smoothing operation to recover the matrix rank which inevitably involves heavy computational complexity and leads to a reduction in the degrees-of-freedom (DOFs). In this paper, a low-complex DOA estimation algorithm by exploiting the discrete Fourier transform (DFT) is proposed. Firstly, the spatial spectrum of the virtual array constructed from the sparse array is established by exploiting the DFT operation. The initial DOA estimation can obtain directly by searching the peaks in the DFT spectrum. However, since the number of array antennas is finite, there exists spectrum power leakage which will cause the performance degradation. To further improve the angle resolution, an iterative process is developed to suppress the spectrum power leakage. Thus, the proposed algorithm does not require the spatial smoothing operation and the computational complexity is reduced effectively. In addition, due to the extention of DOF with the application of the sparse arrays, the proposed algorithm can resolve the underdetermined DOA estimation problems. The superiority of the proposed algorithm is demonstrated by simulation results.
Bei ZHAO Chen CHENG Zhenguo MA Feng YU
Cross correlation is a general way to estimate time delay of arrival (TDOA), with a computational complexity of O(n log n) using fast Fourier transform. However, since only one spike is required for time delay estimation, complexity can be further reduced. Guided by Chinese Remainder Theorem (CRT), this paper presents a new approach called Co-prime Aliased Sparse FFT (CASFFT) in O(n1-1/d log n) multiplications and O(mn) additions, where m is smooth factor and d is stage number. By adjusting these parameters, it can achieve a balance between runtime and noise robustness. Furthermore, it has clear advantage in parallelism and runtime for a large range of signal-to-noise ratio (SNR) conditions. The accuracy and feasibility of this algorithm is analyzed in theory and verified by experiment.
Kee-Hoon KIM Hyun-Seung JOO Jong-Seon NO Dong-Joon SHIN
Many selected mapping (SLM) schemes have been proposed to reduce the peak-to-average power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) signal sequences. In this paper, an efficient selection (ES) method of the OFDM signal sequence with minimum PAPR among many alternative OFDM signal sequences is proposed; it supports various SLM schemes. Utilizing the fact that OFDM signal components can be sequentially generated in many SLM schemes, the generation and PAPR observation of the OFDM signal sequence are processed concurrently. While the u-th alternative OFDM signal components are being generated, by applying the proposed ES method, the generation of that alternative OFDM signal components can be interrupted (or stopped) according to the selection criteria of the best OFDM signal sequence in the considered SLM scheme. Such interruption substantially reduces the average computational complexity of SLM schemes without degradation of PAPR reduction performance, which is confirmed by analytical and numerical results. Note that the proposed method is not an isolated SLM scheme but a subsidiary method which can be easily adopted in many SLM schemes in order to further reduce the computational complexity of considered SLM schemes.
GuoJian OU ShiZhong YANG JianXun DENG QingPing JIANG TianQi ZHANG
This paper describes a fast and effective algorithm for refining the parameter estimates of multicomponent third-order polynomial phase signals (PPSs). The efficiency of the proposed algorithm is accompanied by lower signal-to-noise ratio (SNR) threshold, and computational complexity. A two-step procedure is used to estimate the parameters of multicomponent third-order PPSs. In the first step, an initial estimate for the phase parameters can be obtained by using fast Fourier transformation (FFT), k-means algorithm and three time positions. In the second step, these initial estimates are refined by a simple moving average filter and singular value decomposition (SVD). The SNR threshold of the proposed algorithm is lower than those of the non-linear least square (NLS) method and the estimation refinement method even though it uses a simple moving average filter. In addition, the proposed method is characterized by significantly lower complexity than computationally intensive NLS methods. Simulations confirm the effectiveness of the proposed method.
We investigate the utilization of vector registers (VRs) on reducing memory references for single instruction multiple data fast Fourier transform calculation. We propose to group the butterfly computations in several consecutive stages to maximize utilization of the available VRs and take the advantage of the symmetries in twiddle factors. All the butterflies sharing identical twiddle factors are clustered and computed together to further improve performance. The relationship between the number of fused stages and the number of available VRs is then examined. Experimental results on different platforms show that the proposed method is effective.
Dongpei LIU Hengzhu LIU Botao ZHANG Jianfeng ZHANG Shixian WANG Zhengfa LIANG
High-performance FFT processor is indispensable for real-time OFDM communication systems. This paper presents a CORDIC based design of variable-length FFT processor which can perform various FFT lengths of 64/128/256/512/1024/2048/4096/8192-point. The proposed FFT processor employs memory based architecture in which mixed radix 4/2 algorithm, pipelined CORDIC, and conflict-free parallel memory access scheme are exploited. Besides, the CORDIC rotation angles are generated internally based on the transform of butterfly counter, which eliminates the need of ROM making it memory-efficient. The proposed architecture has a lower hardware complexity because it is ROM-free and with no dedicated complex multiplier. We implemented the proposed FFT processor and verified it on FPGA development platform. Additionally, the processor is also synthesized in 0.18 µm technology, the core area of the processor is 3.47 mm2 and the maximum operating frequency can be up to 500 MHz. The proposed FFT processor is better trade off performance and hardware overhead, and it can meet the speed requirement of most modern OFDM system, such as IEEE 802.11n, WiMax, 3GPP-LTE and DVB-T/H.
Based on our previous work, this work presents a complete method for time-domain processing of frequency-domain data with evenly-spaced frequency indices, together with its application. The proposed method can be used to calculate the cross spectral and power spectral densities for the frequency indices of interest. A promising application for the time-domain processing of frequency-domain data, particularly for calculating the summation of frequency-domain cross- and auto-correlations in orthogonal frequency-division multiplexing (OFDM) systems, is studied. The advantages of the time-domain processing of frequency-domain data are 1) the ability to rapidly acquire the properties that are readily available in the frequency domain and 2) the reduced complexity. The proposed fast algorithm directly employs time-domain samples, and hence, does not need the fast Fourier transform (FFT) operation. The proposed algorithm has a lower complexity (required complex multiplications ∼ O(N)) than conventional techniques.
Hamze Haidar ALAEDDINE El Houssaïn BAGHIOUS Gilles BUREL
This paper is about a new efficient method for the implementation of convolvers and correlators using the Fermat Number Transform (FNT) and the inverse (IFNT). The latter present advantages compared to Inverse Fast Fourier Transform (IFFT). An efficient state space method for implementing the Inverse FNT (IFNT) over rectangular windows is proposed for the cases where there is a large overlap between the consecutive input signals. This is called Inverse Generalized Sliding Fermat Number Transform (IGSFNT) and is useful for reducing the computational complexity of finite ring convolvers and correlators. This algorithm uses the technique of Generalized Sliding associated to matricial calculation in the Galois Field. The computational complexity of this method is compared with that of standard IFNT.
Chin-Long WEY Shin-Yo LIN Pei-Yun TSAI Ming-Der SHIEH
Multi-core processors have been attracting a great deal of attention. In the domain of signal processing for communications, the current trends toward rapidly evolving standards and formats, and toward algorithms adaptive to dynamic factors in the environment, require programmable solutions that possess both algorithm flexibility and low implementation complexity. Reconfigurable architectures have demonstrated better tradeoffs between algorithm flexibility, implementation complexity, and energy efficiency. This paper presents a reconfigurable homogeneous memory-based FFT processor (MBFFT) architecture integrated in a single chip to provide hybrid SISO/MIMO OFDM wireless communication systems. For example, a reconfigurable MBFFT processor with eight processing elements (PEs) can be configured for one DVB-T/H with N=8192 and two 802.11n with N=128. The reconfigurable processors can perfectly fit the applications of Software Defined Radio (SDR) which requires more hardware flexibility.
Chin-Long WEY Shin-Yo LIN Hsu-Sheng WANG Hung-Lieh CHEN Chun-Ming HUANG
In UWB systems, data symbols are transmitted and received continuously. The Fast Fourier Transform (FFT) processor must be able to seamlessly process input/output data. This paper presents the design and implementation of a continuous data flow parallel memory-based FFT (CF-PMBFFT) processor without the use of input buffer for pre-loading the input data. The processor realizes a memory space of two N-words and multiple processing elements (PEs) to achieve the seamless data flow and meet the design requirement. The circuit has been fabricated in TSMC 0.18 µm 1P6M CMOS process with the supply voltage of 1.8 V. Measurement results of the test chip shows that the developed CF-PMBFFT processor takes a core area of 1.97 mm2 with a power consumption of 62.12 mW for a throughput rate of 528 MS/s.
This paper proposes a novel robust audio watermarking algorithm to embed data and extract it in a bit-exact manner based on changing the magnitudes of the FFT spectrum. The key point is selecting a frequency band for embedding based on the comparison between the original and the MP3 compressed/decompressed signal and on a suitable scaling factor. The experimental results show that the method has a very high capacity (about 5 kbps), without significant perceptual distortion (ODG about -0.25) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3). Furthermore, the proposed method has a larger capacity (number of embedded bits to number of host bits rate) than recent image data hiding methods.
Chin-Liang WANG Yuan OUYANG Ming-Yen HSU
One major drawback of orthogonal frequency-division multiplexing is the high peak-to-average power ratio (PAPR) of the output signal. The selected mapping (SLM) and partial transmit sequences (PTS) methods are two promising techniques for PAPR reduction. However, to generate a set of candidate signals, these techniques need a bank of inverse fast Fourier transforms (IFFT's) and thus require high computational complexity. In this paper, we propose two low-complexity multiplication-free conversion processes to replace the IFFT's in the SLM method, where each conversion process for an N-point IFFT involves only 3N complex additions. Using these proposed conversions, we develop several new SLM schemes and a combined SLM & PTS method, in which at least half of the IFFT blocks are reduced. Computer simulation results show that, compared to the conventional methods, these new schemes have approximately the same PAPR reduction performance under the same number of candidate signals for transmission selection.
Abolfazl GHASSEMI T. Aaron GULLIVER
Partial transmit sequence (PTS) is a well known technique used to reduce the peak-to-average power ratio (PAPR) of an orthogonal frequency division multiplexing (OFDM) signal. However, it has relatively high complexity due to the computation of multiple inverse fast Fourier transforms (IFFTs). To reduce this complexity, we use intermediate signals within a decimation in frequency (DIF) radix IFFT and propose a new PTS subblocking technique which requires the computation of only partial IFFTs. Performance results are presented which show a PAPR reduction similar to that with other techniques such as original PTS (O-PTS). Further, we show that complexity reduction can be achieved with either low or high radix IFFT algorithms.
Approximate pattern matching plays an important role in various applications. In this paper we focus on (δ, γ)-matching, where a character can differ at most δ and the sum of these errors is smaller than γ. We show how to find these matches when the pattern is transformed by y=αx + β, without knowing α and β in advance.
This paper presents a novel high-speed, low-complexity two-parallel 128-point radix-24 FFT/IFFT processor for MB-OFDM ultrawideband (UWB) systems. The proposed high-speed, low-complexity FFT architecture can provide a higher throughput rate and low hardware complexity by using a two-parallel data-path scheme and a single-path delay-feedback (SDF) structure. The radix-24 FFT algorithm is also realized in our processor to reduce the number of complex multiplications. The proposed FFT/IFFT processor has been designed and implemented with 0.18-µm CMOS technology in a supply voltage of 1.8 V. The proposed two-parallel FFT/IFFT processor has a throughput rate of up to 900 Msample/s at 450 MHz while requiring much smaller hardware complexity and low power consumption.
In this paper, we present a new fast Fourier transform (FFT) algorithm to reduce the table size of twiddle factors required in pipelined FFT processing. The table size is large enough to occupy significant area and power consumption in long-point FFT processing. The proposed algorithm can reduce the table size to half, compared to the radix-22 algorithm, while retaining the simple structure. To verify the proposed algorithm, a 2048-point pipelined FFT processor is designed using a 0.18 µm CMOS process. By combining the proposed algorithm and the radix-22 algorithm, the table size is reduced to 34% and 51% compared to the radix-2 and radix-22 algorithms, respectively. The FFT processor occupies 1.28 mm2 and achieves a signal-to-quantization-noise ratio (SQNR) of more than 50 dB.
Huiqing ZHAI Qiaowei YUAN Qiang CHEN Kunio SAWAYA
In this research, a sub-array preconditioner is applied to improve the convergence of conjugate gradient (CG) iterative solver in the fast multipole method and fast Fourier transform (FMM-FFT) implementation on a large-scale finite periodic array antenna with arbitrary geometry elements. The performance of the sub-array preconditioner is compared with the near-group preconditioner in the array antenna analysis. It is found that the near-group preconditioner achieves a little better convergence, while the sub-array preconditioner can be easily constructed and programmed with less CPU-time. The efficiency of the CG-FMM-FFT with high efficient preconditioner has been demonstrated in numerical analysis of a finite periodic array antenna.
Huiqing ZHAI Qiang CHEN Qiaowei YUAN Kunio SAWAYA Changhong LIANG
This paper presents method that offers the fast and accurate analysis of large-scale periodic array antennas by conjugate-gradient fast Fourier transform (CG-FFT) combined with an equivalent sub-array preconditioner. Method of moments (MoM) is used to discretize the electric field integral equation (EFIE) and form the impedance matrix equation. By properly dividing a large array into equivalent sub-blocks level by level, the impedance matrix becomes a structure of Three-level Block Toeplitz Matrices. The Three-level Block Toeplitz Matrices are further transformed to Circulant Matrix, whose multiplication with a vector can be rapidly implemented by one-dimension (1-D) fast Fourier transform (FFT). Thus, the conjugate-gradient fast Fourier transform (CG-FFT) is successfully applied to the analysis of a large-scale periodic dipole array by speeding up the matrix-vector multiplication in the iterative solver. Furthermore, an equivalent sub-array preconditioner is proposed to combine with the CG-FFT analysis to reduce iterative steps and the whole CPU-time of the iteration. Some numerical results are given to illustrate the high efficiency and accuracy of the present method.
Zhenyu LIU Yang SONG Takeshi IKENAGA Satoshi GOTO
Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s2s-k)tclk and the throughput is n/(s2s-ktclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18 µm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.9911.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V,25). This processor completes 1024 FFT calculation in 7.839 µs.