A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table

Yoko SOGABE; Tsutomu MARUYAMA

doi:10.1587/transinf.2016EDP7262

A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table

Yoko SOGABE, Tsutomu MARUYAMA

Full Text Views

0

Cite this

Summary :

The purpose of DNA sequencing is to determine the order of nucleotides within a DNA molecule of target. The target DNA molecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a species that was constructed in advance, it is necessary to determine their locations in the target DNA from where they have been extracted by aligning them onto the reference genome. This process is called short read mapping, and it is important to improve the performance of the short read mapping to realize fast DNA sequencing. We propose three types of FPGA acceleration methods based on hash table; (1) sorting and parallel comparison, (2) matching that allows one mutation to reduce the number of the candidates, (3) optimized hash function using variable masks. The first one reduces the number of accesses to off-chip memory to avoid the bottleneck by access latency. The second one enables to reduce the number of the candidates without degrading mapping sensitivity by allowing one mutation in the comparison. The last one reduces hash collisions using a table that was calculated from the reference genome in advance. We implemented the three methods on Xilinx Virtex-7 and evaluated them to show their effectiveness of them. In our experiments, our system achieves 20 fold of processing speed compared with BWA, which is one of the most popular mapping tools. Furthermore, we shows that the our system outperforms one of the fastest FPGA short read mapping systems.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.5 pp.1016-1025

Publication Date: 2017/05/01

Publicized: 2017/01/30

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016EDP7262

Type of Manuscript: PAPER

Category: Computer System

Authors

Yoko SOGABE
University of Tsukuba
Tsutomu MARUYAMA
University of Tsukuba

Keyword

field programmable gate array, genome sequence alignment, accelerated implementation, bioinformatics

Cite this

Copy

Yoko SOGABE, Tsutomu MARUYAMA, "A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 5, pp. 1016-1025, May 2017, doi: 10.1587/transinf.2016EDP7262.
Abstract: The purpose of DNA sequencing is to determine the order of nucleotides within a DNA molecule of target. The target DNA molecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a species that was constructed in advance, it is necessary to determine their locations in the target DNA from where they have been extracted by aligning them onto the reference genome. This process is called short read mapping, and it is important to improve the performance of the short read mapping to realize fast DNA sequencing. We propose three types of FPGA acceleration methods based on hash table; (1) sorting and parallel comparison, (2) matching that allows one mutation to reduce the number of the candidates, (3) optimized hash function using variable masks. The first one reduces the number of accesses to off-chip memory to avoid the bottleneck by access latency. The second one enables to reduce the number of the candidates without degrading mapping sensitivity by allowing one mutation in the comparison. The last one reduces hash collisions using a table that was calculated from the reference genome in advance. We implemented the three methods on Xilinx Virtex-7 and evaluated them to show their effectiveness of them. In our experiments, our system achieves 20 fold of processing speed compared with BWA, which is one of the most popular mapping tools. Furthermore, we shows that the our system outperforms one of the fastest FPGA short read mapping systems.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7262/_p

Copy

@ARTICLE{e100-d_5_1016,
author={Yoko SOGABE, Tsutomu MARUYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table},
year={2017},
volume={E100-D},
number={5},
pages={1016-1025},
abstract={The purpose of DNA sequencing is to determine the order of nucleotides within a DNA molecule of target. The target DNA molecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a species that was constructed in advance, it is necessary to determine their locations in the target DNA from where they have been extracted by aligning them onto the reference genome. This process is called short read mapping, and it is important to improve the performance of the short read mapping to realize fast DNA sequencing. We propose three types of FPGA acceleration methods based on hash table; (1) sorting and parallel comparison, (2) matching that allows one mutation to reduce the number of the candidates, (3) optimized hash function using variable masks. The first one reduces the number of accesses to off-chip memory to avoid the bottleneck by access latency. The second one enables to reduce the number of the candidates without degrading mapping sensitivity by allowing one mutation in the comparison. The last one reduces hash collisions using a table that was calculated from the reference genome in advance. We implemented the three methods on Xilinx Virtex-7 and evaluated them to show their effectiveness of them. In our experiments, our system achieves 20 fold of processing speed compared with BWA, which is one of the most popular mapping tools. Furthermore, we shows that the our system outperforms one of the fastest FPGA short read mapping systems.},
keywords={},
doi={10.1587/transinf.2016EDP7262},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table
T2 - IEICE TRANSACTIONS on Information
SP - 1016
EP - 1025
AU - Yoko SOGABE
AU - Tsutomu MARUYAMA
PY - 2017
DO - 10.1587/transinf.2016EDP7262
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2017
AB - The purpose of DNA sequencing is to determine the order of nucleotides within a DNA molecule of target. The target DNA molecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a species that was constructed in advance, it is necessary to determine their locations in the target DNA from where they have been extracted by aligning them onto the reference genome. This process is called short read mapping, and it is important to improve the performance of the short read mapping to realize fast DNA sequencing. We propose three types of FPGA acceleration methods based on hash table; (1) sorting and parallel comparison, (2) matching that allows one mutation to reduce the number of the candidates, (3) optimized hash function using variable masks. The first one reduces the number of accesses to off-chip memory to avoid the bottleneck by access latency. The second one enables to reduce the number of the candidates without degrading mapping sensitivity by allowing one mutation in the comparison. The last one reduces hash collisions using a table that was calculated from the reference genome in advance. We implemented the three methods on Xilinx Virtex-7 and evaluated them to show their effectiveness of them. In our experiments, our system achieves 20 fold of processing speed compared with BWA, which is one of the most popular mapping tools. Furthermore, we shows that the our system outperforms one of the fastest FPGA short read mapping systems.
ER -