Multilayer Perceptron Training Accelerator Using Systolic Array

Takeshi SENOO; Akira JINGUJI; Ryosuke KURAMOCHI; Hiroki NAKAHARA

doi:10.1587/transinf.2022PAP0003

IEICE TRANSACTIONS on Information

Multilayer Perceptron Training Accelerator Using Systolic Array

Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA

Full Text Views

0

Cite this

Summary :

Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.

Publication: IEICE TRANSACTIONS on Information Vol.E105-D No.12 pp.2048-2056

Publication Date: 2022/12/01

Publicized: 2022/07/21

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022PAP0003

Type of Manuscript: Special Section PAPER (Special Section on Forefront Computing)

Category

Authors

Takeshi SENOO
  Tokyo Institute of Technology
Akira JINGUJI
  Tokyo Institute of Technology
Ryosuke KURAMOCHI
  Tokyo Institute of Technology
Hiroki NAKAHARA
  Tokyo Institute of Technology

Keyword

neural network, training accelerator, multilayer perceptron, machine learning, intrusion detection system

Cite this

Copy

Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, "Multilayer Perceptron Training Accelerator Using Systolic Array" in IEICE TRANSACTIONS on Information, vol. E105-D, no. 12, pp. 2048-2056, December 2022, doi: 10.1587/transinf.2022PAP0003.
Abstract: Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022PAP0003/_p

Copy

@ARTICLE{e105-d_12_2048,
author={Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Multilayer Perceptron Training Accelerator Using Systolic Array},
year={2022},
volume={E105-D},
number={12},
pages={2048-2056},
abstract={Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.},
keywords={},
doi={10.1587/transinf.2022PAP0003},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Multilayer Perceptron Training Accelerator Using Systolic Array
T2 - IEICE TRANSACTIONS on Information
SP - 2048
EP - 2056
AU - Takeshi SENOO
AU - Akira JINGUJI
AU - Ryosuke KURAMOCHI
AU - Hiroki NAKAHARA
PY - 2022
DO - 10.1587/transinf.2022PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2022
AB - Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
ER -

IEICE TRANSACTIONS on Information