1-2hit |
Progress of speech recognition based on the hidden Markov model has made it possible to realize man-machine dialogue systems capable of operating in real time. In spite of considerable effort, however, few systems have been successfully developed because of the lack of appropriate dialogue models. This paper reports on some of technology necessary to develop a dialogue system with which one can converse comfortably. The emphasis is placed on the following three points: how a human converses with a machine; how errors of speech recognition can be recovered through conversation; and what it means for a machine to be cooperative. We examine the first problem by investigating dialogues between human speakers, and dialogues between a human speaker and a simulated machine. As a consideration in the design of dialogue control, we discuss the relation between efficiency and cooperativeness of dialogue, the method for confirming what the machine has recognized, and dynamic adaptation of the machine. Thirdly, we review the research on the friendliness of a natural language interface, mainly concerning the exchange of initiative, corrective and suggestive answers, and indirect questions. Lastly, we describe briefly the current state of the art in speech recognition and synthesis, and suggest what should be done for acceptance of spontaneous speech and production of a voice suitable to the output of a dialogue system.
Yutaka KOBAYASHI Masanori OMOTE Hidenori ENDO Yasuhisa NIIMI
This paper describes an overview of our speech understanding system and reports on the recent results of the sentence recognition experiments. The system, we call SUSKIT-, recognizes database queries in natural Japanese sentences. The user is expected to speak sentence by sentence. Among the difficult problems to overcome, this study paid the prime attentions to how to cope with the contextual variations of pronunciations and how to verify partial sentence hypotheses in a hierarchical system. The SUSKIT- predicts words strings in a top-down manner, however, the verification of hypotheses against the input speech is done using a unit independent of word boundaries. Words are not suitable units of verification because the smoothing effect owing to phonetic contexts makes it difficult to recognize short words. In order to avoid the misrecognition caused by the smoothing effect across word boundaries, the SUSKIT- dynamically extracts those phoneme strings bounded by the easily detectable phonemes from the predicted word string as verification templates. The left-to-right timesynchronous beam-search strategy was adopted for searching likely sentences. We carried out sentence recognition experiments using the speech corpus consists of 159 sentences read by three Japanese male speakers. The task perplexity was 8.3. Using the speaker-dependent HMM parameters, we obtained the sentence recognition rates of 83.0-92.5%.