Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs

Akira JINGUJI; Shimpei SATO; Hiroki NAKAHARA

doi:10.1587/transinf.2021PAP0011

IEICE TRANSACTIONS on Information

Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs

Akira JINGUJI, Shimpei SATO, Hiroki NAKAHARA

Full Text Views

0

Cite this

Summary :

Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.12 pp.2040-2047

Publication Date: 2021/12/01

Publicized: 2021/09/27

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021PAP0011

Type of Manuscript: Special Section PAPER (Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking)

Category

Authors

Akira JINGUJI
  Tokyo Institute of Technology
Shimpei SATO
  Tokyo Institute of Technology
Hiroki NAKAHARA
  Tokyo Institute of Technology

Keyword

CNN, sparse CNN, embedded system, FPGA

Cite this

Copy

Akira JINGUJI, Shimpei SATO, Hiroki NAKAHARA, "Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 12, pp. 2040-2047, December 2021, doi: 10.1587/transinf.2021PAP0011.
Abstract: Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021PAP0011/_p

Copy

@ARTICLE{e104-d_12_2040,
author={Akira JINGUJI, Shimpei SATO, Hiroki NAKAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs},
year={2021},
volume={E104-D},
number={12},
pages={2040-2047},
abstract={Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.},
keywords={},
doi={10.1587/transinf.2021PAP0011},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs
T2 - IEICE TRANSACTIONS on Information
SP - 2040
EP - 2047
AU - Akira JINGUJI
AU - Shimpei SATO
AU - Hiroki NAKAHARA
PY - 2021
DO - 10.1587/transinf.2021PAP0011
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2021
AB - Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.
ER -

IEICE TRANSACTIONS on Information