IEICE global.ieice.org Site

Author Search Result

[Author] Rongchun LI(4hit)

1-4hit

Parallel Sparse Cholesky Factorization on a Heterogeneous Platform
Dan ZOU Yong DOU Rongchun LI

LETTER-Algorithms and Data Structures

Vol:
E96-A No:4
Page(s):
833-834
We present a new approach for sparse Cholesky factorization on a heterogeneous platform with a graphics processing unit (GPU). The sparse Cholesky factorization is one of the core algorithms of numerous computing applications. We tuned the supernode data structure and used a parallelization method for GPU tasks to increase GPU utilization. Results show that our approach substantially reduces computational time.
Design and Implementation of the Parameterized Multi-Standard High-Throughput Radix-4 Viterbi Decoder on FPGA
Rongchun LI Yong DOU Yuanwu LEI Shice NI Song GUO

PAPER-Fundamental Theories for Communications

Vol:
E95-B No:5
Page(s):
1602-1611
This paper presents a parameterized multi-standard adaptive radix-4 Viterbi decoder with high throughput and low complexity. The proposed Viterbi decoder supports constraint lengths ranging from 3-9, code rates in the range of 1/2-1/3, and arbitrary truncation lengths. We present a novel fabric of Add-Compare-Select Unit (ACSU) and methods of unsigned quantization and efficient normalization that shorten the critical path. The decoder achieves a low bit error ratio in multiple standards, such as GPRS, WiMax, LTE, CDMA, and 3G. The proposed decoder is implemented on Xilinx XC5VLX330 device and the frequency achieved is 181.7 MHz. The throughput of the proposed decoder can reach 363 Mbps, which is superior to the other current multi-standard Viterbi decoders or radix-4 Viterbi decoders on the FPGA platform.
Efficient Parallel Interference Cancellation MIMO Detector for Software Defined Radio on GPUs
Rongchun LI Yong DOU Jie ZHOU Chen CHEN

PAPER-Digital Signal Processing

Vol:
E97-A No:6
Page(s):
1388-1395
The parallel interference cancellation (PIC) multiple input multiple output (MIMO) detection algorithm has bit error ratio (BER) performance comparable to the maximum likelihood (ML) algorithm but with complexity close to the simple linear detection algorithm such as zero forcing (ZF), minimum mean squared error (MMSE), and successive interference cancellation (SIC), etc. However, the throughput of PIC MIMO detector on central processing unit (CPU) cannot meet the requirement of wireless protocols. In order to reach the throughput required by the standards, the graphics processing unit (GPU) is exploited in this paper as the modem processor to accelerate the processing procedure of PIC MIMO detector. The parallelism of PIC algorithm is analyzed and the two-stage PIC detection is carefully developed to efficiently match the multi-core architecture. Several optimization methods are employed to enhance the throughput, such as the memory optimization and asynchronous data transfer. The experiment shows that our MIMO detector has excellent BER performance and the peak throughput is 337.84 Mega bits per second (Mbps), about 7x to 16x faster than that of CPU implementation with SSE2 optimization methods. The implemented MIMO detector has better computing throughput than recent GPU-based implementations.
An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU
Rongchun LI Yong DOU Jiaqing XU Xin NIU Shice NI

PAPER-Digital Signal Processing

Vol:
E97-A No:5
Page(s):
1027-1036
In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.

Author Search Result

[Author] Rongchun LI(4hit)

Parallel Sparse Cholesky Factorization on a Heterogeneous Platform

Design and Implementation of the Parameterized Multi-Standard High-Throughput Radix-4 Viterbi Decoder on FPGA

Efficient Parallel Interference Cancellation MIMO Detector for Software Defined Radio on GPUs

An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles