FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos

Jianfeng XU; Satoshi KOMORITA; Kei KAWAMURA

doi:10.1587/transinf.2022EDP7182

IEICE TRANSACTIONS on Information

FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos

Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA

Full Text Views

1

Cite this

Summary :

We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.6 pp.1165-1174

Publication Date: 2023/06/01

Publicized: 2023/03/20

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022EDP7182

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Jianfeng XU
  KDDI Research, Inc.
Satoshi KOMORITA
  KDDI Research, Inc.
Kei KAWAMURA
  KDDI Research, Inc.

Keyword

human pose estimation, heterogeneous networks, temporal correlation, fast networks, slow networks

Cite this

Copy

Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, "FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 6, pp. 1165-1174, June 2023, doi: 10.1587/transinf.2022EDP7182.
Abstract: We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7182/_p

Copy

@ARTICLE{e106-d_6_1165,
author={Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos},
year={2023},
volume={E106-D},
number={6},
pages={1165-1174},
abstract={We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.},
keywords={},
doi={10.1587/transinf.2022EDP7182},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos
T2 - IEICE TRANSACTIONS on Information
SP - 1165
EP - 1174
AU - Jianfeng XU
AU - Satoshi KOMORITA
AU - Kei KAWAMURA
PY - 2023
DO - 10.1587/transinf.2022EDP7182
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2023
AB - We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
ER -

IEICE TRANSACTIONS on Information