In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
Rongchun LI
National University of Defense Technology
Yong DOU
National University of Defense Technology
Jiaqing XU
National University of Defense Technology
Xin NIU
National University of Defense Technology
Shice NI
National University of Defense Technology
GPU, CUDA, SDR, Turbo decoder, SOVA
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Rongchun LI, Yong DOU, Jiaqing XU, Xin NIU, Shice NI, "An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU" in IEICE TRANSACTIONS on Fundamentals,
vol. E97-A, no. 5, pp. 1027-1036, May 2014, doi: 10.1587/transfun.E97.A.1027.
Abstract: In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E97.A.1027/_p
Copy
@ARTICLE{e97-a_5_1027,
author={Rongchun LI, Yong DOU, Jiaqing XU, Xin NIU, Shice NI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU},
year={2014},
volume={E97-A},
number={5},
pages={1027-1036},
abstract={In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.},
keywords={},
doi={10.1587/transfun.E97.A.1027},
ISSN={1745-1337},
month={May},}
Copy
TY - JOUR
TI - An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1027
EP - 1036
AU - Rongchun LI
AU - Yong DOU
AU - Jiaqing XU
AU - Xin NIU
AU - Shice NI
PY - 2014
DO - 10.1587/transfun.E97.A.1027
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E97-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2014
AB - In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
ER -