Full Text Views
60
In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.
Stanislav SEDUKHIN
University of Aizu
Yoichi TOMIOKA
University of Aizu
Kohei YAMAMOTO
Oki Electric Industry Co., Ltd.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Stanislav SEDUKHIN, Yoichi TOMIOKA, Kohei YAMAMOTO, "In Search of the Performance- and Energy-Efficient CNN Accelerators" in IEICE TRANSACTIONS on Electronics,
vol. E105-C, no. 6, pp. 209-221, June 2022, doi: 10.1587/transele.2021LHP0003.
Abstract: In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2021LHP0003/_p
Copy
@ARTICLE{e105-c_6_209,
author={Stanislav SEDUKHIN, Yoichi TOMIOKA, Kohei YAMAMOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={In Search of the Performance- and Energy-Efficient CNN Accelerators},
year={2022},
volume={E105-C},
number={6},
pages={209-221},
abstract={In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.},
keywords={},
doi={10.1587/transele.2021LHP0003},
ISSN={1745-1353},
month={June},}
Copy
TY - JOUR
TI - In Search of the Performance- and Energy-Efficient CNN Accelerators
T2 - IEICE TRANSACTIONS on Electronics
SP - 209
EP - 221
AU - Stanislav SEDUKHIN
AU - Yoichi TOMIOKA
AU - Kohei YAMAMOTO
PY - 2022
DO - 10.1587/transele.2021LHP0003
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E105-C
IS - 6
JA - IEICE TRANSACTIONS on Electronics
Y1 - June 2022
AB - In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.
ER -