Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

Yuma MUNEKAWA; Fumihiko INO; Kenichi HAGIHARA

doi:10.1587/transinf.E93.D.1479

Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

Yuma MUNEKAWA, Fumihiko INO, Kenichi HAGIHARA

Full Text Views

0

Cite this

Summary :

This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.

Publication: IEICE TRANSACTIONS on Information Vol.E93-D No.6 pp.1479-1488

Publication Date: 2010/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E93.D.1479

Type of Manuscript: Special Section PAPER (Special Section on Info-Plosion)

Category: Parallel and Distributed Architecture

Cite this

Copy

Yuma MUNEKAWA, Fumihiko INO, Kenichi HAGIHARA, "Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs" in IEICE TRANSACTIONS on Information, vol. E93-D, no. 6, pp. 1479-1488, June 2010, doi: 10.1587/transinf.E93.D.1479.
Abstract: This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.1479/_p

Copy

@ARTICLE{e93-d_6_1479,
author={Yuma MUNEKAWA, Fumihiko INO, Kenichi HAGIHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs},
year={2010},
volume={E93-D},
number={6},
pages={1479-1488},
abstract={This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.},
keywords={},
doi={10.1587/transinf.E93.D.1479},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs
T2 - IEICE TRANSACTIONS on Information
SP - 1479
EP - 1488
AU - Yuma MUNEKAWA
AU - Fumihiko INO
AU - Kenichi HAGIHARA
PY - 2010
DO - 10.1587/transinf.E93.D.1479
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2010
AB - This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
ER -