FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks

Yuxi SUN; Hideharu AMANO

doi:10.1587/transinf.2020PAP0003

FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks

Yuxi SUN, Hideharu AMANO

Full Text Views

0

Cite this

Summary :

Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.12 pp.2457-2462

Publication Date: 2020/12/01

Publicized: 2020/09/24

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020PAP0003

Type of Manuscript: Special Section PAPER (Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking)

Category: Computer System

Authors

Yuxi SUN
Keio University
Hideharu AMANO
Keio University

Keyword

multi-FPGA, recurrent neural networks, LSTM

Cite this

Copy

Yuxi SUN, Hideharu AMANO, "FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 12, pp. 2457-2462, December 2020, doi: 10.1587/transinf.2020PAP0003.
Abstract: Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0003/_p

Copy

@ARTICLE{e103-d_12_2457,
author={Yuxi SUN, Hideharu AMANO, },
journal={IEICE TRANSACTIONS on Information},
title={FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks},
year={2020},
volume={E103-D},
number={12},
pages={2457-2462},
abstract={Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.},
keywords={},
doi={10.1587/transinf.2020PAP0003},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 2457
EP - 2462
AU - Yuxi SUN
AU - Hideharu AMANO
PY - 2020
DO - 10.1587/transinf.2020PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2020
AB - Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
ER -