Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network

Hao XIAO; Kaikai ZHAO; Guangzhu LIU

doi:10.1587/transinf.2020EDL8153

IEICE TRANSACTIONS on Information

Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network

Hao XIAO, Kaikai ZHAO, Guangzhu LIU

Full Text Views

0

Cite this

Summary :

This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.5 pp.772-775

Publication Date: 2021/05/01

Publicized: 2021/02/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020EDL8153

Type of Manuscript: LETTER

Category: Computer System

Authors

Hao XIAO
  HeFei University of Technology
Kaikai ZHAO
  HeFei University of Technology
Guangzhu LIU
  HeFei University of Technology

Keyword

deep neural networks, filed programmable gate array, run-length compression, sparse data

Cite this

Copy

Hao XIAO, Kaikai ZHAO, Guangzhu LIU, "Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 5, pp. 772-775, May 2021, doi: 10.1587/transinf.2020EDL8153.
Abstract: This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDL8153/_p

Copy

@ARTICLE{e104-d_5_772,
author={Hao XIAO, Kaikai ZHAO, Guangzhu LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network},
year={2021},
volume={E104-D},
number={5},
pages={772-775},
abstract={This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.},
keywords={},
doi={10.1587/transinf.2020EDL8153},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 772
EP - 775
AU - Hao XIAO
AU - Kaikai ZHAO
AU - Guangzhu LIU
PY - 2021
DO - 10.1587/transinf.2020EDL8153
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2021
AB - This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
ER -

IEICE TRANSACTIONS on Information