An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU

Rongchun LI; Yong DOU; Jiaqing XU; Xin NIU; Shice NI

doi:10.1587/transfun.E97.A.1027

IEICE TRANSACTIONS on Fundamentals

An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU

Rongchun LI, Yong DOU, Jiaqing XU, Xin NIU, Shice NI

Full Text Views

0

Cite this

Summary :

In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E97-A No.5 pp.1027-1036

Publication Date: 2014/05/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E97.A.1027

Type of Manuscript: PAPER

Category: Digital Signal Processing

Authors

Rongchun LI
  National University of Defense Technology
Yong DOU
  National University of Defense Technology
Jiaqing XU
  National University of Defense Technology
Xin NIU
  National University of Defense Technology
Shice NI
  National University of Defense Technology

Keyword

GPU, CUDA, SDR, Turbo decoder, SOVA

Cite this

Copy

Rongchun LI, Yong DOU, Jiaqing XU, Xin NIU, Shice NI, "An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU" in IEICE TRANSACTIONS on Fundamentals, vol. E97-A, no. 5, pp. 1027-1036, May 2014, doi: 10.1587/transfun.E97.A.1027.
Abstract: In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E97.A.1027/_p

Copy

@ARTICLE{e97-a_5_1027,
author={Rongchun LI, Yong DOU, Jiaqing XU, Xin NIU, Shice NI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU},
year={2014},
volume={E97-A},
number={5},
pages={1027-1036},
abstract={In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.},
keywords={},
doi={10.1587/transfun.E97.A.1027},
ISSN={1745-1337},
month={May},}

Copy

TY - JOUR
TI - An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1027
EP - 1036
AU - Rongchun LI
AU - Yong DOU
AU - Jiaqing XU
AU - Xin NIU
AU - Shice NI
PY - 2014
DO - 10.1587/transfun.E97.A.1027
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E97-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2014
AB - In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
ER -

IEICE TRANSACTIONS on Fundamentals