1-16hit |
Feng YU Lei WANG Dan GAO Yingguan WANG Xiaolin ZHANG
In this paper, a real-time medium access control (MAC) protocol based on a coding-black-burst mechanism with low latency and high energy efficiency is proposed for wireless sensor networks. The Black-Burst (BB) mechanism is used to provide real-time access. However, when the traffic load is heavy, BB will cause a lot of energy loss and latency due to its large length. A binary coding mechanism is applied to BB in our coding-black-burst-based protocol to reduce the energy consumption and latency by at least (L-2(log2 L+1)) for L-length BB. The new mechanism also gives priority to the real-time traffic with longer waiting delays to access the channel. The theoretical analysis and simulation results indicate that our protocol provides low end-to-end delay and high energy efficiency for real-time communication.
Min YUAN Qianjian XING Zhenguo MA Feng YU Yingke XU
In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.
Meiting XUE Huan ZHANG Weijun LI Feng YU
Sorting is one of the most fundamental problems in mathematics and computer science. Because high-throughput and flexible sorting is a key requirement in modern databases, this paper presents efficient techniques for designing a high-throughput sorting matrix that supports continuous data sequences. There have been numerous studies on the optimization of sorting circuits on FPGA (field-programmable gate array) platforms. These studies focused on attaining high throughput for a single command with fixed data width. However, the architectures proposed do not meet the requirement of diversity for database data types. A sorting matrix architecture is thus proposed to overcome this problem. Our design consists of a matrix of identical basic sorting cells. The sorting cells work in a pipeline and in parallel, and the matrix can simultaneously process multiple data streams, which can be combined into a high-width single-channel data stream or low-width multiple-channel data streams. It can handle continuous sequences and allows for sorting variable-length data sequences. Its maximum throughput is approximately 1.4 GB/s for 32-bit sequences and approximately 2.5 GB/s for 64-bit sequences on our platform.
We investigate the utilization of vector registers (VRs) on reducing memory references for single instruction multiple data fast Fourier transform calculation. We propose to group the butterfly computations in several consecutive stages to maximize utilization of the available VRs and take the advantage of the symmetries in twiddle factors. All the butterflies sharing identical twiddle factors are clustered and computed together to further improve performance. The relationship between the number of fused stages and the number of available VRs is then examined. Experimental results on different platforms show that the proposed method is effective.
Qianjian XING Feng YU Xiaobo YIN Bei ZHAO
In this letter, we present a radix-R regular interconnection pattern family of factorizations for the WHT-FFT with identical stage-to-stage interconnection pattern in a unified form, where R is any power of 2. This family of algorithms has identical sparse matrix factorization in each stage and can be implemented in a merged butterfly structure, which conduce to regular and efficient memory managing scalable to high radices. And in each stage, the butterflies with same twiddle factor set are aggregated together, which can reduce the twiddle factor evaluations or accesses to the lookup table. The kinds of factorization can also be extended to FFT, WHT and SCHT with identical stage-to-stage interconnection pattern.
In this paper we apply angle recoding to the CORDIC-based processing elements in a scalable architecture for complex matrix inversion. We extend the processing elements from the scalable real matrix inversion architecture to the complex domain and obtain the novel scalable complex matrix inversion architecture, which can significantly reduce computational complexity. We rearrange the CORDIC elements to make one half of the processing elements simple and compact. For the other half of the processing elements, the efficient use of angler recoding reduces the number of microrotation steps of the CORDIC elements to 3/4. Consequently, only 3 CORDIC elements are required for the processing elements with full utilization.
Qianjian XING Zhenguo MA Feng YU
This letter presents a novel memory-based architecture for radix-2 fast Walsh-Hadamard-Fourier transform (FWFT) based on the constant geometry FWFT algorithm. It is composed of a multi-function Processing Engine, a conflict-free memory addressing scheme and an efficient twiddle factor generator. The address for memory access and the control signals for stride permutation are formulated in detail and the methods can be applied to other memory-based FFT-like architectures.
Tingting CHEN Weijun LI Feng YU Qianjian XING
A modular serial pipelined sorting architecture for continuous input sequences is presented. It supports continuous sequences, whose lengths can be dynamically changed, and does so using a very simple control strategy. It consists of identical serial cascaded sorting cells, and lends itself to high frequency implementation with any number of sorting cells, because both data and control signals are pipelined. With L cascaded sorting cells, it produces a fully sorted result for sequences whose length N is equal to or less than L+1; for longer sequences, the largest L elements are sorted out. Being modularly designed, several independent smaller sorters can be dynamically configured to form a larger sorter.
Binhao HE Meiting XUE Shubiao LIU Feng YU Weijie CHEN
The top-K sorting is a variant of sorting used heavily in applications such as database management systems. Recently, the use of field programmable gate arrays (FPGAs) to accelerate sorting operation has attracted the interest of researchers. However, existing hardware top-K sorting algorithms are either resource-intensive or of low throughput. In this paper, we present a resource-efficient top-K sorting architecture that is composed of L cascading sorting units, and each sorting unit is composed of P sorting cells. K=PL largest elements are produced when a variable length input sequence is processed. This architecture can operate at a high frequency while consuming fewer resources. The experimental results show that our architecture achieved a maximum 1.2x throughput-to-resource improvement compared to previous studies.
Katsumi MORISHITA Shi Feng YUAN Yoshihiro MIYAKE Takahiro FUJIHARA
It is shown that the glass structure change is a simple and widely applicable method to modify refractive index locally in various glass fibers. A small part of a glass fiber is heated immediately to above its melt temperature by arc discharge, and then the molten fiber undergoes rapid cooling, which freezes the change of the glass structure. Therefore the refractive index of the fiber is decreased partially by the glass structure change induced by rapid solidification. The index reduction in a fiber fabricated from multicomponent glasses is estimated to be more than 0.006. To clarify that rapid solidification works for various glasses including silica glasses, long-period gratings are written in a standard telecommunication fiber with various discharge currents and times. The peak loss of more than 25 dB is obtained within only 6 periods. The index change can be adjusted by the discharge conditions. The gratings are not degraded by heating the whole gratings at 700C for 2 hours, and are highly temperature-stable. It is shown that resonance wavelengths can be tuned by controlling the heating temperature and heating time.
Spectral compressive sensing is a novel approach that enables extraction of spectral information from a spectral-sparse signal, exclusively from its compressed measurements. Thus, the approach has received considerable attention from various fields. However, standard compressive sensing algorithms always require a sparse signal to be on the grid, whose spacing is the standard resolution limit. Thus, these algorithms severely degenerate while handling spectral compressive sensing, owing to the off-the-grid issue. Some off-the-grid algorithms were recently proposed to solve this problem, but they are either inaccurate or computationally expensive. In this paper, we propose a novel algorithm named parameterized ℓ1-minimization (PL1), which can efficiently solves the off-the-grid spectral estimation problem with relatively low computational complexity.
Yizhi REN Zelong LI Lifeng YUAN Zhen ZHANG Chunhua SU Yujuan WANG Guohua WU
The recommend system has been widely used in many web application areas such as e-commerce services. With the development of the recommend system, the HIN modeling method replaces the traditional bipartite graph modeling method to represent the recommend system. But several studies have already showed that recommend system is vulnerable to shilling attack (injecting attack). However, the effectiveness of how traditional shilling attack has rarely been studied directly in the HIN model. Moreover, no study has focused on how to enhance shilling attacks against HIN recommend system by using the high-level semantic information. This work analyzes the relationship between the high-level semantic information and the attacking effects in HIN recommend system. This work proves that attack results are proportional to the high-level semantic information. Therefore, we propose a heuristic attack method based on high-level semantic information, named Semantic Shilling Attack (SSA) on a HIN recommend system (HERec). This method injects a specific score into each selected item related to the target in semantics. It ensures transmitting the misleading information towards target items and normal users, and attempts to interfere with the effect of the recommend system. The experiment is dependent on two real-world datasets, and proves that the attacking effect is positively correlate with the number of meta-paths. The result shows that our method is more effective when compared with existing baseline algorithms.
Due to its on-demand and pay-as-you-go properties, cloud computing has become an attractive alternative for HPC applications. However, communication-intensive applications with complex communication patterns still cannot be performed efficiently on cloud platforms, which are equipped with MapReduce technologies, such as Hadoop and Spark. In particular, one major obstacle is that MapReduce's simple programming model cannot explicitly manipulate data transfers between compute nodes. Another obstacle is cloud's relatively poor network performance compared with traditional HPC platforms. The traditional Strassen's algorithm of square matrix multiplication has a recursive and complex pattern on the HPC platform. Therefore, it cannot be directly applied to the cloud platform. In this paper, we demonstrate how to make Strassen's algorithm with complex communication patterns “cloud-friendly”. By reorganizing Strassen's algorithm in an iterative pattern, we completely separate its computations and communications, making it fit to MapReduce programming model. By adopting a novel data/task parallel strategy, we solve Strassen's data dependency problems, making it well balanced. This is the first instance of Strassen's algorithm in MapReduce-style systems, which also matches Strassen's communication lower bound. Further experimental results show that it achieves a speedup ranging from 1.03× to 2.50× over the classical Θ(n3) algorithm. We believe the principle can be applied to many other complex scientific applications.
Bofeng YUAN Xuewen LIAO Xinmin LUO
The multiple-input-multiple-output (MIMO) Gaussian wireless network with K users and an intermediate relay is investigated. In this network, each user with available local channel state information (CSI) intends to convey a multicast message to all other users while receiving all messages from other users via the relay. This model is termed the MIMO K-way relay channel with distributed CSI. For this channel, the sum capacity is shown as MK/(K-1)log(SNR)+o(SNR) where each user and the relay is equipped with M antennas. Achievability is based on the signal space alignment strategy with a K-1 time slot extension. A most general case is then considered, in which each user intends to recover all messages required within T time slots. We provide an improved scheme called fractional signal space alignment which achieves MK/(K-1) degrees of freedom in the general case and the feasibility condition is also explored.
The sparse Fourier transform (SFT) seeks to recover k non-negligible Fourier coefficients from a k-sparse signal of length N (k«N). A single frequency signal can be recovered via the Chinese remainder theorem (CRT) with sub-sampled discrete Fourier transforms (DFTs). However, when there are multiple non-negligible coefficients, more of them may collide, and multiple stages of sub-sampled DFTs are needed to deal with such collisions. In this paper, we propose a combinatorial aliasing-based SFT (CASFT) algorithm that is robust to noise and greatly reduces the number of stages by iteratively recovering coefficients. First, CASFT detects collisions and recovers coefficients via the CRT in a single stage. These coefficients are then subtracted from each stage, and the process iterates through the other stages. With a computational complexity of O(klog klog 2N) and sample complexity of O(klog 2N), CASFT is a novel and efficient SFT algorithm.
Bei ZHAO Chen CHENG Zhenguo MA Feng YU
Cross correlation is a general way to estimate time delay of arrival (TDOA), with a computational complexity of O(n log n) using fast Fourier transform. However, since only one spike is required for time delay estimation, complexity can be further reduced. Guided by Chinese Remainder Theorem (CRT), this paper presents a new approach called Co-prime Aliased Sparse FFT (CASFFT) in O(n1-1/d log n) multiplications and O(mn) additions, where m is smooth factor and d is stage number. By adjusting these parameters, it can achieve a balance between runtime and noise robustness. Furthermore, it has clear advantage in parallelism and runtime for a large range of signal-to-noise ratio (SNR) conditions. The accuracy and feasibility of this algorithm is analyzed in theory and verified by experiment.