Full Text Views
33
Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
Asuka MAKI
Kioxia Corporation
Daisuke MIYASHITA
Kioxia Corporation
Shinichi SASAKI
Kioxia Corporation
Kengo NAKATA
Kioxia Corporation
Fumihiko TACHIBANA
Kioxia Corporation
Tomoya SUZUKI
Kioxia Corporation
Jun DEGUCHI
Kioxia Corporation
Ryuichi FUJIMOTO
Kioxia Corporation
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, "Weight Compression MAC Accelerator for Effective Inference of Deep Learning" in IEICE TRANSACTIONS on Electronics,
vol. E103-C, no. 10, pp. 514-523, October 2020, doi: 10.1587/transele.2019CTP0007.
Abstract: Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2019CTP0007/_p
Copy
@ARTICLE{e103-c_10_514,
author={Asuka MAKI, Daisuke MIYASHITA, Shinichi SASAKI, Kengo NAKATA, Fumihiko TACHIBANA, Tomoya SUZUKI, Jun DEGUCHI, Ryuichi FUJIMOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={Weight Compression MAC Accelerator for Effective Inference of Deep Learning},
year={2020},
volume={E103-C},
number={10},
pages={514-523},
abstract={Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.},
keywords={},
doi={10.1587/transele.2019CTP0007},
ISSN={1745-1353},
month={October},}
Copy
TY - JOUR
TI - Weight Compression MAC Accelerator for Effective Inference of Deep Learning
T2 - IEICE TRANSACTIONS on Electronics
SP - 514
EP - 523
AU - Asuka MAKI
AU - Daisuke MIYASHITA
AU - Shinichi SASAKI
AU - Kengo NAKATA
AU - Fumihiko TACHIBANA
AU - Tomoya SUZUKI
AU - Jun DEGUCHI
AU - Ryuichi FUJIMOTO
PY - 2020
DO - 10.1587/transele.2019CTP0007
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E103-C
IS - 10
JA - IEICE TRANSACTIONS on Electronics
Y1 - October 2020
AB - Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
ER -