The search functionality is under construction.

IEICE TRANSACTIONS on Information

Face-to-Talk: Audio-Visual Speech Detection for Robust Speech Recognition in Noisy Environment

Kazumasa MURAI, Satoshi NAKAMURA

  • Full Text Views

    0

  • Cite this

Summary :

This paper discusses "face-to-talk" audio-visual speech detection for robust speech recognition in noisy environment, which consists of facial orientation based switch and audio-visual speech section detection. Most of today's speech recognition systems must actually turned on and off by a switch e.g. "push-to-talk" to indicate which utterance should be recognized, and a specific speech section must be detected prior to any further analysis. To improve usability and performance, we have researched how to extract the useful information from visual modality. We implemented a facial orientation based switch, which activates the speech recognition during a speaker is facing to the camera. Then, the speech section is detected by analyzing the image of the face. Visual speech detection is robust to audio noise, but because the articulation starts prior to the speech and lasts longer than the speech, the detected section tends to be longer and ends up with insertion errors. Therefore, we have fused the audio-visual modality detected sections. Our experiment confirms that the proposed audio-visual speech detection method improves recognition performance in noisy environment.

Publication
IEICE TRANSACTIONS on Information Vol.E86-D No.3 pp.505-513
Publication Date
2003/03/01
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Issue on Speech Information Processing)
Category
Robust Speech Recognition and Enhancement

Authors

Keyword