1-6hit |
Xiao Hua CHEN Hak-Keong SIM Pang Shyan KOOI
A novel CDMA multi-user detection scheme, orthogonal decision-feedback detector (ODFD), is proposed for a synchronous CDMA system in this paper. It is robust for its near-far resistance and high multi-user detection efficiency with a performance similar to that of decorrelating decision-feedback detector (DDFD) but with a reduced complexity. The ODFD scheme employs a match-filter bank that matches a set of ortho-normal sequences. The ortho-normal sequences are generated by the Gram-Schmidt orthogonalisation procedure based on the spreading codes. The ODFD algorithm involves only with the ortho-normal coefficient-matrix which requires no frequent recalculations even when system parameters change. Successive decision-feedback detection is carried out immediately at the output of the ODFD match-filter bank without matrix inversion operations, resulting in a much simplified structure.
Xiao Hua CHEN Tao LANG Juhani OKSMAN
A new scheme to study the performance of a DS/CDMA indoor wireless system, the correlation statistics distribution convolution(CSDC)modeling, is introduced in this paper. With the aid of the CSDC modeling, the bit error rate versus number of simultaneous interfering transmitters can be directly evaluated, considering the effects of Rayleigh fading, power control, multipath and co-channel interference. The performance of two CDMA receiver structures, conventional correlator and RAKE receiver, is compared. It is shown that the RAKE receiver is effective in improving the system performance under indoor multipath fading. However, its effectiveness under transmitter power control is sensitive to the severity of multipath interference in the indoor channel. When the multipath fading is severe, a tight power control over the main paths may not be able to improve the performance of the RAKE receiver.
Xiao Hua CHEN Tao LANG Juhani OKSMAN
Either GMW sequence or m-sequence possesses a 2-valued auto-correlation function which helps to improve the performance of a RAKE receiver. However, their cross-correlation functions are less well controlled. Before they can be applied to a CDMA system, it is necessary to construct their sub-families (taking advantage of their large family size) which offer satisfactory cross-correlation functions. This paper studies several algorithms for constructing those quasi-optimum sub-families in terms of minimized bit error rate under co-channel interference. The study shows that the performance of resultant sub-families is sensitive to sub-family sizes and algorithms. A new criterion based on combined (even and odd) maximum cross-correlation for code selection is introduced, and highest-peak-deleting and most-peak-deleting algorithms are suggested for constructing quasi-optimum sub-families of GMW and m-sequences.
This paper presents a new technique to implement a convolutional codec in VLSI. The code is used in the Trellis Code Modulation. The technique aims to reduce hardware complexity and increase throughput to decode the convolutional code using Viterbi algorithm. To simplify decoding algorithm and calculation, branch cost distances are pre-calculated and stored in a Distance Look Up Table (DLUT). By using the DLUT to get each branch cost in the algorithm, the hardware implementation of the algorithm does not require any calculation circuits. Furthermore, based on the trellis diagram, an Output Look-Up-Table (OLUT) is also constructed for decoding output generation. This table reduces the amount of storage in the algorithm. The use of look-up tables reduces hardware complexity and increases throughput of the decoder. Using this technique, a 16-states, radix-4 TCM codec with 2-D and 4-D was designed and implemented in both FPGA and ASIC after mathematically simulated. The tested ASIC has a core area of 1.1 mm2 in 0.18 µm CMOS technology and yields a decoding speed over 500 Mbps. Implementation results have shown that LUT can be used to decrease hardware requirement and to increase decoding speed. The designed codec can be used as an IP core to be integrated into system-on-chip applications and the technique can be explored to use to decode the turbo code.
Junyang ZHANG Yang GUO Xiao HU Rongzhen LI
In recent years, deep learning based image recognition, speech recognition, text translation and other related applications have brought great convenience to people's lives. With the advent of the era of internet of everything, how to run a computationally intensive deep learning algorithm on a limited resources edge device is a major challenge. For an edge oriented computing vector processor, combined with a specific neural network model, a new data layout method for putting the input feature maps in DDR, rearrangement of the convolutional kernel parameters in the nuclear memory bank is proposed. Aiming at the difficulty of parallelism of two-dimensional matrix convolution, a method of parallelizing the matrix convolution calculation in the third dimension is proposed, by setting the vector register with zero as the initial value of the max pooling to fuse the rectified linear unit (ReLU) activation function and pooling operations to reduce the repeated access to intermediate data. On the basis of single core implementation, a multi-core implementation scheme of Inception structure is proposed. Finally, based on the proposed vectorization method, we realize five kinds of neural network models, namely, AlexNet, VGG16, VGG19, GoogLeNet, ResNet18, and performance statistics and analysis based on CPU, gtx1080TI and FT2000 are presented. Experimental results show that the vector processor has better computing advantages than CPU and GPU, and can calculate large-scale neural network model in real time.
Yuanqi SU Yuehu LIU Xiao HUANG
We present a fast voting scheme for localizing circular objects among clutter and occlusion. Typical solutions for the problem are based on Hough transform that evaluates an instance of circle by counting the number of edge points along its boundary. The evaluated value is proportional to radius, making the normalization with respect to the factor necessary for detecting circles with different radii. By representing circle with a number of sampled points, we get rid of the step. To evaluate an instance then involves obtaining the same number of edge points, each close to a sampled point in both spatial position and orientation. The closeness is measured by compatibility function, where a truncating operation is used to suppress noise and deal with occlusion. To evaluate all instances of circle is fulfilled by letting edge point vote in a maximizing way such that any instance possesses a set of maximally compatible edge points. The voting process is further separated into the radius-independent and -dependent parts. The time-consuming independent part can be shared by different radii and outputs the sparse matrices. The radius-dependent part shifts these sparse matrices according to the radius. We present precision-recall curves showing that the proposed approach outperforms the solutions based on Hough transform, at the same time, achieves the comparable time complexity as algorithm of Hough transform using 2D accumulator array.