Labeling a salient region accurately in video with cluttered background and complex motion condition is still a challenging work. Most existing video salient region detection models mainly extract the stimulus-driven saliency features to detect the salient region in video. They are easily influenced by the cluttered background and complex motion conditions. It may lead to incomplete or wrong detection results. In this paper, we propose a video salient region detection framework by fusing the stimulus-driven saliency features and spatiotemporal consistency cue to improve the performance of detection under these complex conditions. On one hand, stimulus-driven spatial saliency features and temporal saliency features are extracted effectively to derive the initial spatial and temporal salient region map. On the other hand, in order to make use of the spatiotemporal consistency cue, an effective spatiotemporal consistency optimization model is presented. We use this model optimize the initial spatial and temporal salient region map. Then the superpixel-level spatiotemporal salient region map is derived by optimizing the initial spatiotemporal salient region map. Finally, the pixel-level spatiotemporal salient region map is derived by solving a self-defined energy model. Experimental results on the challenging video datasets demonstrate that the proposed video salient region detection framework outperforms state-of-the-art methods.
Yunfei ZHENG
PLA University of Science and Technology,Army Officer Academy of PLA,the Key Laboratory of Polarization Imaging Detection Technology
Xiongwei ZHANG
PLA University of Science and Technology
Lei BAO
Army Officer Academy of PLA
Tieyong CAO
PLA University of Science and Technology
Yonggang HU
PLA University of Science and Technology
Meng SUN
PLA University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yunfei ZHENG, Xiongwei ZHANG, Lei BAO, Tieyong CAO, Yonggang HU, Meng SUN, "A Video Salient Region Detection Framework Using Spatiotemporal Consistency Optimization" in IEICE TRANSACTIONS on Fundamentals,
vol. E100-A, no. 2, pp. 688-701, February 2017, doi: 10.1587/transfun.E100.A.688.
Abstract: Labeling a salient region accurately in video with cluttered background and complex motion condition is still a challenging work. Most existing video salient region detection models mainly extract the stimulus-driven saliency features to detect the salient region in video. They are easily influenced by the cluttered background and complex motion conditions. It may lead to incomplete or wrong detection results. In this paper, we propose a video salient region detection framework by fusing the stimulus-driven saliency features and spatiotemporal consistency cue to improve the performance of detection under these complex conditions. On one hand, stimulus-driven spatial saliency features and temporal saliency features are extracted effectively to derive the initial spatial and temporal salient region map. On the other hand, in order to make use of the spatiotemporal consistency cue, an effective spatiotemporal consistency optimization model is presented. We use this model optimize the initial spatial and temporal salient region map. Then the superpixel-level spatiotemporal salient region map is derived by optimizing the initial spatiotemporal salient region map. Finally, the pixel-level spatiotemporal salient region map is derived by solving a self-defined energy model. Experimental results on the challenging video datasets demonstrate that the proposed video salient region detection framework outperforms state-of-the-art methods.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E100.A.688/_p
Copy
@ARTICLE{e100-a_2_688,
author={Yunfei ZHENG, Xiongwei ZHANG, Lei BAO, Tieyong CAO, Yonggang HU, Meng SUN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A Video Salient Region Detection Framework Using Spatiotemporal Consistency Optimization},
year={2017},
volume={E100-A},
number={2},
pages={688-701},
abstract={Labeling a salient region accurately in video with cluttered background and complex motion condition is still a challenging work. Most existing video salient region detection models mainly extract the stimulus-driven saliency features to detect the salient region in video. They are easily influenced by the cluttered background and complex motion conditions. It may lead to incomplete or wrong detection results. In this paper, we propose a video salient region detection framework by fusing the stimulus-driven saliency features and spatiotemporal consistency cue to improve the performance of detection under these complex conditions. On one hand, stimulus-driven spatial saliency features and temporal saliency features are extracted effectively to derive the initial spatial and temporal salient region map. On the other hand, in order to make use of the spatiotemporal consistency cue, an effective spatiotemporal consistency optimization model is presented. We use this model optimize the initial spatial and temporal salient region map. Then the superpixel-level spatiotemporal salient region map is derived by optimizing the initial spatiotemporal salient region map. Finally, the pixel-level spatiotemporal salient region map is derived by solving a self-defined energy model. Experimental results on the challenging video datasets demonstrate that the proposed video salient region detection framework outperforms state-of-the-art methods.},
keywords={},
doi={10.1587/transfun.E100.A.688},
ISSN={1745-1337},
month={February},}
Copy
TY - JOUR
TI - A Video Salient Region Detection Framework Using Spatiotemporal Consistency Optimization
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 688
EP - 701
AU - Yunfei ZHENG
AU - Xiongwei ZHANG
AU - Lei BAO
AU - Tieyong CAO
AU - Yonggang HU
AU - Meng SUN
PY - 2017
DO - 10.1587/transfun.E100.A.688
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E100-A
IS - 2
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - February 2017
AB - Labeling a salient region accurately in video with cluttered background and complex motion condition is still a challenging work. Most existing video salient region detection models mainly extract the stimulus-driven saliency features to detect the salient region in video. They are easily influenced by the cluttered background and complex motion conditions. It may lead to incomplete or wrong detection results. In this paper, we propose a video salient region detection framework by fusing the stimulus-driven saliency features and spatiotemporal consistency cue to improve the performance of detection under these complex conditions. On one hand, stimulus-driven spatial saliency features and temporal saliency features are extracted effectively to derive the initial spatial and temporal salient region map. On the other hand, in order to make use of the spatiotemporal consistency cue, an effective spatiotemporal consistency optimization model is presented. We use this model optimize the initial spatial and temporal salient region map. Then the superpixel-level spatiotemporal salient region map is derived by optimizing the initial spatiotemporal salient region map. Finally, the pixel-level spatiotemporal salient region map is derived by solving a self-defined energy model. Experimental results on the challenging video datasets demonstrate that the proposed video salient region detection framework outperforms state-of-the-art methods.
ER -