Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

Wenkai LIU; Cuizhu QIN; Menglong WU; Wenle BAI; Hongxia DONG

doi:10.1587/transinf.2022EDL8093

IEICE TRANSACTIONS on Information

Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

Wenkai LIU, Cuizhu QIN, Menglong WU, Wenle BAI, Hongxia DONG

Full Text Views

4

Cite this

Summary :

Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.1081-1084

Publication Date: 2023/05/01

Publicized: 2023/02/15

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022EDL8093

Type of Manuscript: LETTER

Category: Human-computer Interaction

Authors

Wenkai LIU
  North China University of Technology
Cuizhu QIN
  North China University of Technology
Menglong WU
  North China University of Technology
Wenle BAI
  North China University of Technology
Hongxia DONG
  North China University of Technology

Keyword

pose estimation, multi-scale convergence network, receptive field, attention mechanism

Cite this

Copy

Wenkai LIU, Cuizhu QIN, Menglong WU, Wenle BAI, Hongxia DONG, "Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 5, pp. 1081-1084, May 2023, doi: 10.1587/transinf.2022EDL8093.
Abstract: Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8093/_p

Copy

@ARTICLE{e106-d_5_1081,
author={Wenkai LIU, Cuizhu QIN, Menglong WU, Wenle BAI, Hongxia DONG, },
journal={IEICE TRANSACTIONS on Information},
title={Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network},
year={2023},
volume={E106-D},
number={5},
pages={1081-1084},
abstract={Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.},
keywords={},
doi={10.1587/transinf.2022EDL8093},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network
T2 - IEICE TRANSACTIONS on Information
SP - 1081
EP - 1084
AU - Wenkai LIU
AU - Cuizhu QIN
AU - Menglong WU
AU - Wenle BAI
AU - Hongxia DONG
PY - 2023
DO - 10.1587/transinf.2022EDL8093
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
ER -

IEICE TRANSACTIONS on Information