We propose a method to fuse auditory information and visual information for accurate speech recognition. This method fuses two kinds of information by using Iinear combination after calculating two kinds of probabilities by HMM for each word. In addition, we use full-frame color image as visual information in order to improve the accuracy of the proposed speech recognition system. We have performed experiments comparing the proposed method with the method using either auditory information or visual information, and confirmed the validity of the proposed method.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Satoru IGAWA, Akio OGIHARA, Akira SHINTANI, Shinobu TAKAMATSU, "Speech Recognition Based on Fusion of Visual and Auditory Information Using Full-Framse Color Image" in IEICE TRANSACTIONS on Fundamentals,
vol. E79-A, no. 11, pp. 1836-1840, November 1996, doi: .
Abstract: We propose a method to fuse auditory information and visual information for accurate speech recognition. This method fuses two kinds of information by using Iinear combination after calculating two kinds of probabilities by HMM for each word. In addition, we use full-frame color image as visual information in order to improve the accuracy of the proposed speech recognition system. We have performed experiments comparing the proposed method with the method using either auditory information or visual information, and confirmed the validity of the proposed method.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e79-a_11_1836/_p
Copy
@ARTICLE{e79-a_11_1836,
author={Satoru IGAWA, Akio OGIHARA, Akira SHINTANI, Shinobu TAKAMATSU, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Speech Recognition Based on Fusion of Visual and Auditory Information Using Full-Framse Color Image},
year={1996},
volume={E79-A},
number={11},
pages={1836-1840},
abstract={We propose a method to fuse auditory information and visual information for accurate speech recognition. This method fuses two kinds of information by using Iinear combination after calculating two kinds of probabilities by HMM for each word. In addition, we use full-frame color image as visual information in order to improve the accuracy of the proposed speech recognition system. We have performed experiments comparing the proposed method with the method using either auditory information or visual information, and confirmed the validity of the proposed method.},
keywords={},
doi={},
ISSN={},
month={November},}
Copy
TY - JOUR
TI - Speech Recognition Based on Fusion of Visual and Auditory Information Using Full-Framse Color Image
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1836
EP - 1840
AU - Satoru IGAWA
AU - Akio OGIHARA
AU - Akira SHINTANI
AU - Shinobu TAKAMATSU
PY - 1996
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E79-A
IS - 11
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - November 1996
AB - We propose a method to fuse auditory information and visual information for accurate speech recognition. This method fuses two kinds of information by using Iinear combination after calculating two kinds of probabilities by HMM for each word. In addition, we use full-frame color image as visual information in order to improve the accuracy of the proposed speech recognition system. We have performed experiments comparing the proposed method with the method using either auditory information or visual information, and confirmed the validity of the proposed method.
ER -