BFF R-CNN: Balanced Feature Fusion for Object Detection

Hongzhe LIU; Ningwei WANG; Xuewei LI; Cheng XU; Yaze LI

doi:10.1587/transinf.2021EDP7261

IEICE TRANSACTIONS on Information

BFF R-CNN: Balanced Feature Fusion for Object Detection

Hongzhe LIU, Ningwei WANG, Xuewei LI, Cheng XU, Yaze LI

Full Text Views

0

Cite this

Summary :

In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

Publication: IEICE TRANSACTIONS on Information Vol.E105-D No.8 pp.1472-1480

Publication Date: 2022/08/01

Publicized: 2022/05/17

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDP7261

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Hongzhe LIU
  Beijing Union University
Ningwei WANG
  Beijing Union University
Xuewei LI
  Beijing Union University
Cheng XU
  Beijing Union University
Yaze LI
  Beijing Union University

Keyword

deep learning, neural network, object detection, feature fusion

Cite this

Copy

Hongzhe LIU, Ningwei WANG, Xuewei LI, Cheng XU, Yaze LI, "BFF R-CNN: Balanced Feature Fusion for Object Detection" in IEICE TRANSACTIONS on Information, vol. E105-D, no. 8, pp. 1472-1480, August 2022, doi: 10.1587/transinf.2021EDP7261.
Abstract: In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7261/_p

Copy

@ARTICLE{e105-d_8_1472,
author={Hongzhe LIU, Ningwei WANG, Xuewei LI, Cheng XU, Yaze LI, },
journal={IEICE TRANSACTIONS on Information},
title={BFF R-CNN: Balanced Feature Fusion for Object Detection},
year={2022},
volume={E105-D},
number={8},
pages={1472-1480},
abstract={In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.},
keywords={},
doi={10.1587/transinf.2021EDP7261},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - BFF R-CNN: Balanced Feature Fusion for Object Detection
T2 - IEICE TRANSACTIONS on Information
SP - 1472
EP - 1480
AU - Hongzhe LIU
AU - Ningwei WANG
AU - Xuewei LI
AU - Cheng XU
AU - Yaze LI
PY - 2022
DO - 10.1587/transinf.2021EDP7261
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2022
AB - In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.
ER -

IEICE TRANSACTIONS on Information