Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.
Yinhui ZHANG
Kunming University of Science and Technology
Zifen HE
Kunming University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yinhui ZHANG, Zifen HE, "Large Displacement Dynamic Scene Segmentation through Multiscale Saliency Flow" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 7, pp. 1871-1876, July 2016, doi: 10.1587/transinf.2015EDP7482.
Abstract: Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7482/_p
Copy
@ARTICLE{e99-d_7_1871,
author={Yinhui ZHANG, Zifen HE, },
journal={IEICE TRANSACTIONS on Information},
title={Large Displacement Dynamic Scene Segmentation through Multiscale Saliency Flow},
year={2016},
volume={E99-D},
number={7},
pages={1871-1876},
abstract={Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.},
keywords={},
doi={10.1587/transinf.2015EDP7482},
ISSN={1745-1361},
month={July},}
Copy
TY - JOUR
TI - Large Displacement Dynamic Scene Segmentation through Multiscale Saliency Flow
T2 - IEICE TRANSACTIONS on Information
SP - 1871
EP - 1876
AU - Yinhui ZHANG
AU - Zifen HE
PY - 2016
DO - 10.1587/transinf.2015EDP7482
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2016
AB - Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.
ER -