Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation

Yong HE; Ji LI; Xuanhong ZHOU; Zewei CHEN; Xin LIU

doi:10.1587/transinf.2020EDP7235

IEICE TRANSACTIONS on Information

Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation

Yong HE, Ji LI, Xuanhong ZHOU, Zewei CHEN, Xin LIU

Full Text Views

0

Cite this

Summary :

6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.7 pp.1039-1048

Publication Date: 2021/07/01

Publicized: 2021/03/26

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020EDP7235

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Yong HE
  Chongqing University
Ji LI
  Chongqing University
Xuanhong ZHOU
  Chongqing University
Zewei CHEN
  Chongqing University
Xin LIU
  Chongqing University

Keyword

6DoF pose estimation, semantic segmentation, keypoint localization, deep learning, attention mechanism

Cite this

Copy

Yong HE, Ji LI, Xuanhong ZHOU, Zewei CHEN, Xin LIU, "Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 7, pp. 1039-1048, July 2021, doi: 10.1587/transinf.2020EDP7235.
Abstract: 6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7235/_p

Copy

@ARTICLE{e104-d_7_1039,
author={Yong HE, Ji LI, Xuanhong ZHOU, Zewei CHEN, Xin LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation},
year={2021},
volume={E104-D},
number={7},
pages={1039-1048},
abstract={6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.},
keywords={},
doi={10.1587/transinf.2020EDP7235},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation
T2 - IEICE TRANSACTIONS on Information
SP - 1039
EP - 1048
AU - Yong HE
AU - Ji LI
AU - Xuanhong ZHOU
AU - Zewei CHEN
AU - Xin LIU
PY - 2021
DO - 10.1587/transinf.2020EDP7235
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2021
AB - 6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.
ER -

IEICE TRANSACTIONS on Information