1-4hit |
Takeshi HOMMA Yasunari OBUCHI Kazuaki SHIMA Rintaro IKESHITA Hiroaki KOKUBO Takuya MATSUMOTO
For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
Hiromi BABA Tsukasa NOMA Naoyuki OKADA
This paper discusses visualization of temporal and spatial information in natural language descriptions (NLDs), focusing on the translation process of intermediate representations of NLDs to proper scenarios" and environments" for animations. First, the intermediate representations are shown according to the idea of actors. Actors and non-actors are represented as primitives of objects, whereas actions as those of events. Temporal and spatial constraints by a given NLD text are imposed upon the primitives. Then, the representations containing unknown temporal or spatial parameters --time and coordinates-- are translated into evaluation functions, where the unlikelihood of the deviations from the predicted temporal or spatial relations are estimated. Particularly, the functions concerning actor's movements contain both temporal and spatial parameters. Next, the sum of all the evaluation functions is minimized by a nonlinear optimization method. Thus, the most proper actors' time-table, or scenario, and non-actors' location-table, or environment, for visualization are obtained. Implementation and experiments show that both temporal and spatial information in NLDs are well connected through actors' movements for visualization.
Zero-pronouns and overt pronouns occur frequently in Japanese text. These must be interpreted by recognizing their antecedents to properly understand' a piece of discourse. The notion of centering" has been used to help in the interpretation process for intersentential anaphors. This is based on the premise that in a piece of discourse, some members have a greater amount of attention put on it than other members. In Japanese, the zero-pronoun is said to have the greatest amount of attention put on it. But, when there are more than one zero-pronoun in a sentence, only one of them would be accountable using centering. Overt pronouns and any other zero-pronouns may as well have appeared as ordinary' noun phrases. In this paper, the notion of centering has been extended so that these can also be interpreted. Basically, zero-pronouns and overt pronouns are treated as being more centered" in the discourse than other ordinary' noun phrases. They are put in an ordered list called the Center List. Any other noun phrases appearing in a sentence are put in another list called the Possible Center List. Noun phrases within both lists are ordered according to their degrees of salience. To see the effect of our approach, it was implemented in a simple system with minimal constraints and evaluated. The result showed that when the antecedent is in either the Center List or the Possible Center List, 80% of all zero-pronouns and overt pronouns were properly interpreted.
Teruhiko UKITA Satoshi KINOSHITA Kazuo SUMITA Hiroshi SANO Shin'ya AMANO
Resolving ambiguities in interpreting the user's utterances is one of the most fundamental problems in the development of a question-answering system. The process of disambiguating interpretations requires knowledge and inference functions on an objective task field. This paper describes a framework for understanding conversational language, using the multi-paradigm knowledge representation (frames" and rules") which represents concept hierarchy and causal relationships for an objective field. Knowledge of the objective field is used in the process to interpret input sentences as a model for the objective world. In interpreting sentences, a procedure judges preferences for interpretation candidates by identifying causal relationship with messages in the preceding context, where the causal relationship is used to supplement some shortage of information and to give either an affirmative or a negative explanation to the interpretation. The procedure has been implemented in an experimental question-answering system, whose current task is consultation in operating an electronic device. The experimental results are shown for a concrete problem involving resolving anaphoric references, and characteristics of the knowledge processing system are discussed.