A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism

Ryohei KOBAYASHI; Kenji KISE

doi:10.1587/transinf.2016EDP7383

IEICE TRANSACTIONS on Information

A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism

Ryohei KOBAYASHI, Kenji KISE

Full Text Views

0

Cite this

Summary :

Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.5 pp.1003-1015

Publication Date: 2017/05/01

Publicized: 2017/01/30

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016EDP7383

Type of Manuscript: PAPER

Category: Computer System

Authors

Ryohei KOBAYASHI
University of Tsukuba
Kenji KISE
Tokyo Institute of Technology

Keyword

sorting, hardware accelerator, data compression, open source

Cite this

Copy

Ryohei KOBAYASHI, Kenji KISE, "A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 5, pp. 1003-1015, May 2017, doi: 10.1587/transinf.2016EDP7383.
Abstract: Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7383/_p

Copy

@ARTICLE{e100-d_5_1003,
author={Ryohei KOBAYASHI, Kenji KISE, },
journal={IEICE TRANSACTIONS on Information},
title={A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism},
year={2017},
volume={E100-D},
number={5},
pages={1003-1015},
abstract={Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.},
keywords={},
doi={10.1587/transinf.2016EDP7383},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism
T2 - IEICE TRANSACTIONS on Information
SP - 1003
EP - 1015
AU - Ryohei KOBAYASHI
AU - Kenji KISE
PY - 2017
DO - 10.1587/transinf.2016EDP7383
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2017
AB - Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.
ER -

IEICE TRANSACTIONS on Information