The search functionality is under construction.

IEICE TRANSACTIONS on Information

Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System

Tobias CINCAREK, Hiromichi KAWANAMI, Ryuichi NISIMURA, Akinobu LEE, Hiroshi SARUWATARI, Kiyohiro SHIKANO

  • Full Text Views

    0

  • Cite this

Summary :

In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the system's performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i.e. speech recognizer and the response generation module, is assessed. Although depending on the system's modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15 k utterances for training the acoustic model, 40-50 k utterances for training the language model and 40 k-50 k utterances for compiling the question and answer database. The Q&A database was most important for improving the system's response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.

Publication
IEICE TRANSACTIONS on Information Vol.E91-D No.3 pp.576-587
Publication Date
2008/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e91-d.3.576
Type of Manuscript
Special Section PAPER (Special Section on Robust Speech Processing in Realistic Environments)
Category
Applications

Authors

Keyword