Segmentation of the Speaker's Face Region with Audiovisual Correlation

Yuyu LIU; Yoichi SATO

doi:10.1587/transinf.E93.D.1965

IEICE TRANSACTIONS on Information

Segmentation of the Speaker's Face Region with Audiovisual Correlation

Yuyu LIU, Yoichi SATO

Full Text Views

0

Cite this

Summary :

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.

Publication: IEICE TRANSACTIONS on Information Vol.E93-D No.7 pp.1965-1975

Publication Date: 2010/07/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E93.D.1965

Type of Manuscript: PAPER

Category: Multimedia Pattern Processing

Cite this

Copy

Yuyu LIU, Yoichi SATO, "Segmentation of the Speaker's Face Region with Audiovisual Correlation" in IEICE TRANSACTIONS on Information, vol. E93-D, no. 7, pp. 1965-1975, July 2010, doi: 10.1587/transinf.E93.D.1965.
Abstract: The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.1965/_p

Copy

@ARTICLE{e93-d_7_1965,
author={Yuyu LIU, Yoichi SATO, },
journal={IEICE TRANSACTIONS on Information},
title={Segmentation of the Speaker's Face Region with Audiovisual Correlation},
year={2010},
volume={E93-D},
number={7},
pages={1965-1975},
abstract={The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.},
keywords={},
doi={10.1587/transinf.E93.D.1965},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Segmentation of the Speaker's Face Region with Audiovisual Correlation
T2 - IEICE TRANSACTIONS on Information
SP - 1965
EP - 1975
AU - Yuyu LIU
AU - Yoichi SATO
PY - 2010
DO - 10.1587/transinf.E93.D.1965
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2010
AB - The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
ER -

IEICE TRANSACTIONS on Information

Segmentation of the Speaker's Face Region with Audiovisual Correlation

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Segmentation of the Speaker's Face Region with Audiovisual Correlation

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles