Weight Compression MAC Accelerator for Effective Inference of Deep Learning

Asuka MAKI; Daisuke MIYASHITA; Shinichi SASAKI; Kengo NAKATA; Fumihiko TACHIBANA; Tomoya SUZUKI; Jun DEGUCHI; Ryuichi FUJIMOTO

doi:10.1587/transele.2019CTP0007

IEICE TRANSACTIONS on Electronics

Open Access
Weight Compression MAC Accelerator for Effective Inference of Deep Learning

Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO

Full Text Views

33

Cite this

Free PDF (4MB)

Summary :

Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.

Publication: IEICE TRANSACTIONS on Electronics Vol.E103-C No.10 pp.514-523

Publication Date: 2020/10/01

Publicized: 2020/05/15

Online ISSN: 1745-1353

DOI: 10.1587/transele.2019CTP0007

Type of Manuscript: Special Section PAPER (Special Section on Analog Circuits and Their Application Technologies)

Category: Integrated Electronics

Authors

Asuka MAKI
  Kioxia Corporation
Daisuke MIYASHITA
  Kioxia Corporation
Shinichi SASAKI
  Kioxia Corporation
Kengo NAKATA
  Kioxia Corporation
Fumihiko TACHIBANA
  Kioxia Corporation
Tomoya SUZUKI
  Kioxia Corporation
Jun DEGUCHI
  Kioxia Corporation
Ryuichi FUJIMOTO
  Kioxia Corporation

Keyword

deep learning, convolutional neural network, quantization, variable bit width, post-training, inference, accelerator, processor, FPGA

Cite this

Copy

Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, "Weight Compression MAC Accelerator for Effective Inference of Deep Learning" in IEICE TRANSACTIONS on Electronics, vol. E103-C, no. 10, pp. 514-523, October 2020, doi: 10.1587/transele.2019CTP0007.
Abstract: Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2019CTP0007/_p

Copy

@ARTICLE{e103-c_10_514,
author={Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={Weight Compression MAC Accelerator for Effective Inference of Deep Learning},
year={2020},
volume={E103-C},
number={10},
pages={514-523},
abstract={Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.},
keywords={},
doi={10.1587/transele.2019CTP0007},
ISSN={1745-1353},
month={October},}

Copy

TY - JOUR
TI - Weight Compression MAC Accelerator for Effective Inference of Deep Learning
T2 - IEICE TRANSACTIONS on Electronics
SP - 514
EP - 523
AU - Asuka MAKI
AU - Daisuke MIYASHITA
AU - Shinichi SASAKI
AU - Kengo NAKATA
AU - Fumihiko TACHIBANA
AU - Tomoya SUZUKI
AU - Jun DEGUCHI
AU - Ryuichi FUJIMOTO
PY - 2020
DO - 10.1587/transele.2019CTP0007
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E103-C
IS - 10
JA - IEICE TRANSACTIONS on Electronics
Y1 - October 2020
AB - Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
ER -

IEICE TRANSACTIONS on Electronics