This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
Hao XIAO
HeFei University of Technology
Kaikai ZHAO
HeFei University of Technology
Guangzhu LIU
HeFei University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hao XIAO, Kaikai ZHAO, Guangzhu LIU, "Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 5, pp. 772-775, May 2021, doi: 10.1587/transinf.2020EDL8153.
Abstract: This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDL8153/_p
Copy
@ARTICLE{e104-d_5_772,
author={Hao XIAO, Kaikai ZHAO, Guangzhu LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network},
year={2021},
volume={E104-D},
number={5},
pages={772-775},
abstract={This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.},
keywords={},
doi={10.1587/transinf.2020EDL8153},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 772
EP - 775
AU - Hao XIAO
AU - Kaikai ZHAO
AU - Guangzhu LIU
PY - 2021
DO - 10.1587/transinf.2020EDL8153
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2021
AB - This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
ER -