Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.
Kosuke TOMITA
Osaka University
Masahide HATANAKA
Osaka University
Takao ONOYE
Osaka University
Viterbi decoder, TVDA, GPU, CUDA, SDR
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kosuke TOMITA, Masahide HATANAKA, Takao ONOYE, "Implementation of Viterbi Decoder toward GPU-Based SDR Receiver" in IEICE TRANSACTIONS on Fundamentals,
vol. E98-A, no. 11, pp. 2246-2253, November 2015, doi: 10.1587/transfun.E98.A.2246.
Abstract: Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E98.A.2246/_p
Copy
@ARTICLE{e98-a_11_2246,
author={Kosuke TOMITA, Masahide HATANAKA, Takao ONOYE, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Implementation of Viterbi Decoder toward GPU-Based SDR Receiver},
year={2015},
volume={E98-A},
number={11},
pages={2246-2253},
abstract={Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.},
keywords={},
doi={10.1587/transfun.E98.A.2246},
ISSN={1745-1337},
month={November},}
Copy
TY - JOUR
TI - Implementation of Viterbi Decoder toward GPU-Based SDR Receiver
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2246
EP - 2253
AU - Kosuke TOMITA
AU - Masahide HATANAKA
AU - Takao ONOYE
PY - 2015
DO - 10.1587/transfun.E98.A.2246
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E98-A
IS - 11
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - November 2015
AB - Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.
ER -