The search functionality is under construction.

The search functionality is under construction.

In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI^{*}, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables *inter-filter parallelism*, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.

- Publication
- IEICE TRANSACTIONS on Information Vol.E103-D No.12 pp.2463-2470

- Publication Date
- 2020/12/01

- Publicized
- 2020/08/03

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.2020PAP0013

- Type of Manuscript
- Special Section PAPER (Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking)

- Category
- Computer System

Masayuki SHIMODA

Tokyo Institute of Technology

Youki SADA

Tokyo Institute of Technology

Ryosuke KURAMOCHI

Tokyo Institute of Technology

Shimpei SATO

Tokyo Institute of Technology

Hiroki NAKAHARA

Tokyo Institute of Technology

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki NAKAHARA, "SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 12, pp. 2463-2470, December 2020, doi: 10.1587/transinf.2020PAP0013.

Abstract: In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI^{*}, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables *inter-filter parallelism*, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0013/_p

Copy

@ARTICLE{e103-d_12_2463,

author={Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki NAKAHARA, },

journal={IEICE TRANSACTIONS on Information},

title={SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators},

year={2020},

volume={E103-D},

number={12},

pages={2463-2470},

abstract={In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI^{*}, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables *inter-filter parallelism*, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.},

keywords={},

doi={10.1587/transinf.2020PAP0013},

ISSN={1745-1361},

month={December},}

Copy

TY - JOUR

TI - SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators

T2 - IEICE TRANSACTIONS on Information

SP - 2463

EP - 2470

AU - Masayuki SHIMODA

AU - Youki SADA

AU - Ryosuke KURAMOCHI

AU - Shimpei SATO

AU - Hiroki NAKAHARA

PY - 2020

DO - 10.1587/transinf.2020PAP0013

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E103-D

IS - 12

JA - IEICE TRANSACTIONS on Information

Y1 - December 2020

AB - In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI^{*}, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables *inter-filter parallelism*, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.

ER -