Gradient Corrected Approximation for Binary Neural Networks

Song CHENG; Zixuan LI; Yongsen WANG; Wanbing ZOU; Yumei ZHOU; Delong SHANG; Shushan QIAO

doi:10.1587/transinf.2021EDL8026

IEICE TRANSACTIONS on Information

Gradient Corrected Approximation for Binary Neural Networks

Song CHENG, Zixuan LI, Yongsen WANG, Wanbing ZOU, Yumei ZHOU, Delong SHANG, Shushan QIAO

Full Text Views

0

Cite this

Summary :

Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.10 pp.1784-1788

Publication Date: 2021/10/01

Publicized: 2021/07/05

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDL8026

Type of Manuscript: LETTER

Category: Biocybernetics, Neurocomputing

Authors

Song CHENG
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences
Zixuan LI
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences
Yongsen WANG
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences
Wanbing ZOU
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences
Yumei ZHOU
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences
Delong SHANG
  Institute of Microelectronics of Chinese Academy of Sciences,IMECAS
Shushan QIAO
  University of Chinese Academy of Sciences,Institute of Microelectronics of Chinese Academy of Sciences

Keyword

binary neural network, deep learning, gradient approximation, fine-tuning

Cite this

Copy

Song CHENG, Zixuan LI, Yongsen WANG, Wanbing ZOU, Yumei ZHOU, Delong SHANG, Shushan QIAO, "Gradient Corrected Approximation for Binary Neural Networks" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 10, pp. 1784-1788, October 2021, doi: 10.1587/transinf.2021EDL8026.
Abstract: Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8026/_p

Copy

@ARTICLE{e104-d_10_1784,
author={Song CHENG, Zixuan LI, Yongsen WANG, Wanbing ZOU, Yumei ZHOU, Delong SHANG, Shushan QIAO, },
journal={IEICE TRANSACTIONS on Information},
title={Gradient Corrected Approximation for Binary Neural Networks},
year={2021},
volume={E104-D},
number={10},
pages={1784-1788},
abstract={Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.},
keywords={},
doi={10.1587/transinf.2021EDL8026},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Gradient Corrected Approximation for Binary Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 1784
EP - 1788
AU - Song CHENG
AU - Zixuan LI
AU - Yongsen WANG
AU - Wanbing ZOU
AU - Yumei ZHOU
AU - Delong SHANG
AU - Shushan QIAO
PY - 2021
DO - 10.1587/transinf.2021EDL8026
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2021
AB - Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.
ER -

IEICE TRANSACTIONS on Information