Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Tatsuo YOTSUKURA; Shigeo MORISHIMA; Satoshi NAKAMURA

doi:10.1093/ietisy/e88-d.11.2477

IEICE TRANSACTIONS on Information

Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Tatsuo YOTSUKURA, Shigeo MORISHIMA, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.11 pp.2477-2483

Publication Date: 2005/11/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.11.2477

Type of Manuscript: Special Section PAPER (Special Section on Life-like Agent and its Communication)

Category

Cite this

Copy

Tatsuo YOTSUKURA, Shigeo MORISHIMA, Satoshi NAKAMURA, "Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 11, pp. 2477-2483, November 2005, doi: 10.1093/ietisy/e88-d.11.2477.
Abstract: An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.11.2477/_p

Copy

@ARTICLE{e88-d_11_2477,
author={Tatsuo YOTSUKURA, Shigeo MORISHIMA, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation},
year={2005},
volume={E88-D},
number={11},
pages={2477-2483},
abstract={An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.},
keywords={},
doi={10.1093/ietisy/e88-d.11.2477},
ISSN={},
month={November},}

Copy

TY - JOUR
TI - Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation
T2 - IEICE TRANSACTIONS on Information
SP - 2477
EP - 2483
AU - Tatsuo YOTSUKURA
AU - Shigeo MORISHIMA
AU - Satoshi NAKAMURA
PY - 2005
DO - 10.1093/ietisy/e88-d.11.2477
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2005
AB - An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.
ER -

IEICE TRANSACTIONS on Information

Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles