The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] dialogue(46hit)

21-40hit(46hit)

  • An Integrated Dialogue Analysis Model for Determining Speech Acts and Discourse Structures

    Won Seug CHOI  Harksoo KIM  Jungyun SEO  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:1
      Page(s):
    150-157

    Analysis of speech acts and discourse structures is essential to a dialogue understanding system because speech acts and discourse structures are closely tied with the speaker's intention. However, it has been difficult to infer a speech act and a discourse structure from a surface utterance because they highly depend on the context of the utterance. We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using a maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from an annotated dialogue corpus. Moreover, the model can analyze speech acts and discourse structures in one framework. In the experiment, the model showed better performance than other previous works.

  • A Spoken Dialogue Interface for TV Operations Based on Data Collected by Using WOZ Method

    Jun GOTO  Kazuteru KOMINE  Masaru MIYAZAKI  Yeun-Bae KIM  Noriyoshi URATANI  

     
    PAPER

      Vol:
    E87-D No:6
      Page(s):
    1397-1404

    The development of multi-channel digital broadcasting has generated a demand not only for new services but also for smart and highly functional capabilities in all broadcast-related devices. This is especially true of TV receivers on the viewer's side. With the aim of achieving a friendly interface that anybody can use with ease, we built a prototype spoken dialogue interface for TV operation based on data collected by using Wizard of Oz method. At the current stage of our research, we are using this system to investigate the usefulness and problem areas of an interactive voice interface for TV operation.

  • Dialogue Languages and Persons with Disabilities

    Akira ICHIKAWA  

     
    INVITED PAPER

      Vol:
    E87-D No:6
      Page(s):
    1312-1319

    Any utterances of dialogue, spoken language or sign language, have functions that enable recipients to achieve real-time and easy understanding and to control conversation smoothly in spite of its volatile characteristics. In this paper, we present evidence of these functions obtained experimentally. Prosody plays a very important role not only in spoken language (aural language) but also in sign language (visual language) and finger braille (tactile language). Skilled users of a language may detect word boundaries in utterances and estimate sentence structure immediately using prosody. The gestures and glances of a recipient may influence the utterances of the sender, leading to amendments of the contents of utterances and smooth exchanges in turn. Individuality and emotion in utterances are also very important aspects of effective communication support systems for persons with disabilities even more so than for those non-disabled persons. The trials described herein are universal in design. Some trials carried out to develop these systems are also reported.

  • Conversation Robot Participating in Group Conversation

    Yosuke MATSUSAKA  Tsuyoshi TOJO  Tetsunori KOBAYASHI  

     
    INVITED PAPER

      Vol:
    E86-D No:1
      Page(s):
    26-36

    We developed a conversation system which can participate in a group conversation. Group conversation is a form of conversation in which three or more participants talk to each other about a topic on an equal footing. Conventional conversation systems have been designed under the assumption that each system merely talked with only one person. Group conversation is different from these conventional systems in the following points. It is necessary for the system to understand the conversational situation such as who is speaking, to whom he is speaking, and also to whom the other participants pay attention. It is also necessary for the system itself to try to affect the situation appropriately. In this study, we realized the function of recognizing the conversational situation, by combining image processing and acoustic processing, and the function of working on the conversational situation utilizing facial and body actions of the robot. Thus, a robot that can join in the group conversation was realized.

  • A Distributed Agent Architecture for Intelligent Multi-Domain Spoken Dialogue Systems

    Bor-Shen LIN  Hsin-Min WANG  Lin-Shan LEE  

     
    PAPER-Speech and Hearing

      Vol:
    E84-D No:9
      Page(s):
    1217-1230

    Multi-domain spoken dialogue systems with high degree of intelligence and domain extensibility have long been desired but difficult to achieve. When the user freely surfs among different topics during the dialogue, it will be very difficult for the system to control the switching of the topics and domains while keeping the dialogue consistent, and decide when and how to take the initiative. This paper presents a distributed agent architecture for multi-domain spoken dialogue systems with high domain extensibility and intelligence. Under this architecture, different spoken dialogue agents (SDA's) handling different domains can be developed independently, and then smoothly cooperate with one another to achieve the user's multiple goals, while a user interface agent (UIA) can access the correct spoken dialogue agent through a domain switching protocol, and carry over the dialogue state and history so as to keep the knowledge processed coherently across different domains.

  • Evaluating Dialogue Strategies under Communication Errors Using Computer-to-Computer Simulation

    Taro WATANABE  Masahiro ARAKI  Shuji DOSHITA  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E81-D No:9
      Page(s):
    1025-1033

    In this paper, experimental results of evaluating dialogue strategies of confirmation with a noisy channel are presented. First, the types of errors in task-oriented dialogues are investigated and classified as communication, dialogue, knowledge, problem solving, or objective errors. Since the errors are of different levels, the methods for recovering from errors must be examined separately. We have investigated that the dialogue and knowledge errors generated by communication errors can be recovered through system confirmation with the user. In addition, we examined that the manner in which a system initiates dialogue, namely, dialogue strategies, might influence the cooperativity of their interactions depending on the frequency of confirmations and the amount of information conveyed. Furthermore, the choice of dialogue strategies will be influenced by the rate of occurrence of communication errors in a communication channel and related to the properties of the task, for example, the difficulty in achieving a goal or the frequency of the movement of initiatives. To verify these hypotheses, we prepared a testbed task, the Group Scheduling Task, and examined it through a computer-to-computer dialogue simulation in which one system took the part of a scheduling system and the other system acted as a user. In this simulation, erroneous input for the scheduling system was also developed. The user system was designed to act randomly so that it could simulate a real human user, while the scheduling system was devised to strictly follow a particular dialogue strategy of confirmation. The experimental results showed that a certain amount of confirmation was required to overcome errors when the rate of occurrence of communication errors was high, but that excessive confirmation did not serve to resolve errors, depending on the task involved.

  • Man-Machine Interaction Using a Vision System with Dual Viewing Angles

    Ying-Jieh HUANG  Hiroshi DOHI  Mitsuru ISHIZUKA  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E80-D No:11
      Page(s):
    1074-1083

    This paper describes a vision system with dual viewing angles, i. e., wide and narrow viewing angles, and a scheme of user-friendly speech dialogue environment based on the vision system. The wide viewing angle provides a wide viewing field for wide range motion tracking, and the narrow viewing angle is capable of following a target in wide viewing field to take the image of the target with sufficient resolution. For a fast and robust motion tracking, modified motion energy (MME) and existence energy (EE) are defined to detect the motion of the target and extract the motion region at the same time. Instead of using a physical device such as a foot switch commonly used in speech dialogue systems, the begin/end of an utterance is detected from the movement of user's mouth in our system. Without recognizing the movement of lips directly, the shape variation of the region between lips is tracked for more stable recognition of the span of a dialogue. The tracking speed is about 10 frames/sec when no recognition is performed and about 5 frames/sec when both tracking and recognition are performed without using any special hardware.

  • Hybrid Method of Data Collection for Evaluating Speech Dialogue System

    Shu NAKAZATO  Ikuo KUDO  Katsuhiko SHIRAI  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E79-D No:1
      Page(s):
    41-46

    In this paper, we propose a new method of dialogue data collection which can be used to evaluate modules of a spoken dialogue system. To evaluate the module, it is necessary to use suitable data. Human-human dialogue data have not been appropriate to module evaluation, because spontaneous data usually include too much specific phenomena such as fillers, restarts, pauses, and hesitations. Human-machine dialogue data have not been appropriate to module evaluation, because the dialogue was unnatural and the available vocabularies were limited. Here, we propose 'Hybrid method' for the collection of spoken dialogue data. The merit is that, the collected data can be used as test data for the evaluation of a spoken dialogue system without any modification. In our method a human takes the role of some modules of the system and the system, also, works as the other part of the system together. For example, humans works as the speech recognition module and the dialogue management and a machine does the other part, response generation module. The collected data are good for the evaluation of the speech recognition and the dialogue management modules. The reasons are as follows. (1) Lexicon: The lexicon was composed of limited words and dependent on the task. (2) Grammar: The intention expressed by the subjects were concise and clear. (3) Topics: There were few utterances outside the task domain. The collected data can be used test data for the evaluation of a spoken dialogue system without any modification.

  • A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance

    Osamu YOSHIOKA  Yasuhiro MINAMI  Kiyohiro SHIKANO  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    616-621

    This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.

  • Cooperative Spoken Dialogue Model Using Bayesian Network and Event Hierarchy

    Masahiro ARAKI  Shuji DOSHITA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    629-635

    In this paper, we propose a dialogue model that reflects two important aspects of spoken dialogue system: to be robust' and to be cooperative'. For this purpose, our model has two main inference spaces: Conversational Space (CS) and Problem Solving Space (PSS). CS is a kind of dynamic Bayesian network that represents a meaning of utterance and general dialogue rule. Robust' aspect is treated in CS. PSS is a network so called Event Hierarchy that represents the structure of task domain problems. Cooperative' aspect is mainly treated in PSS. In constructing CS and making inference on PSS, system's process, from meaning understanding through response generation, is modeled by dividing into five steps. These steps are (1) meaning understanding, (2) intention understanding, (3) communicative effect, (4) reaction generation, and (5) response generation. Meaning understanding step constructs CS and response generation step composes a surface expression of system's response from the part of CS. Intention understanding step makes correspondence utterance type in CS with action in PSS. Reaction generation step selects a cooperative reaction in PSS and expands a reaction to utterance type of CS. The status of problem solving and declared user's preference are recorded in mental state by communicative effect step. Then from our point of view, cooperative problem solving dialogue is regarded as a process of constructing CS and achieving goal in PSS through these five steps.

  • Error Analysis of Field Trial Results of a Spoken Dialogue System for Telecommunications Applications

    Shingo KUROIWA  Kazuya TAKEDA  Masaki NAITO  Naomi INOUE  Seiichi YAMAMOTO  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    636-641

    We carried out a one year field trial of a voice-activated automatic telephone exchange service at KDD Laboratories which has about 200 branch phones. This system has DSP-based continuous speech recognition hardware which can process incoming calls in real time using a vocabulary of 300 words. The recognition accuracy was found to be 92.5% for speech read from a written text under laboratory conditions independent of the speaker. In this paper, we describe the performance of the system obtained as a result of the field trial. Apart from recognition accuracy, there was about 20% error due to out-of-vocabulary input and incorrect detection of speech endpoints which had not been allowed for in the laboratory experiments. Also, we found that the recognition accuracy for actual speech was about 18% lower than for speech read from text even if there were no out-of-vocabulary words. In this paper, we examine error variations for individual data in order to try and pinpoint the cause of incorrect recognition. It was found from experiments on the collected data that the pause model used, filled pause grammar and differences of channel frequency response seriously affected recognition accuracy. With the help of simple techniques to overcome these problems, we finally obtained a recognition accuracy of 88.7% for real data.

  • Design and Construction of an Advisory Dialogue Database

    Tadahiko KUMAMOTO  Akira ITO  Tsuyoshi EBINA  

     
    PAPER-Databases

      Vol:
    E78-D No:4
      Page(s):
    420-427

    We are aming to develop a computer-based consultant system which helps novice computer users to achieve their task goals on computers through natural language dialogues. Our target is spoken Japanese. To develop effective methods for processing spoken Japanese, it is essential to analyze real dialogues and to find the characteristics of spoken Japanese. In this paper, we discuss the design problems associated with constructing a spoken dialogue database from the viewpoint of advisory dialogue collection, describe XMH (X-window-based electronic mail handling program) usage experiments made to collect advisory dialogues between novice XMH users and an expert consultant, and show the dialogue database we constructed from these dialogues. The main features of our database are as follows: (1) Our target dialogues were advisory ones. (2) The advisory dialogues were all related to the use of XMH that has a visual interface operated by a keyboard and a mouse. (3) The primary objective of the users was not to engage in dialogues but to achieve specific task goals using XMH. (4) Not only what the users said but also XMH operations performed by the users are included as dialogue elements. This kind of dialogue database is a very effective source for developing new methods for processing spoken language in multimodal consultant systems, and we have therefore made it available to the public. Based on our analysis of the database, we have already developed several effective methods such as a method for recognizing user's communicative intention from a transcript of spoken Japanese, and a method for controlling dialogues between a novice XMH user and the computer-based consultant system which we are developing. Also, we have proposed several response generation rules as the response strategy for the consultant system. We have developed an experimental consultant system by implementing the above methods and strategy.

  • Throughput Analysis of ARQ Schemes in Dialogue Communication over Half-Duplex Line

    Chun-Xiang CHEN  Masaharu KOMATSU  Kozo KINOSHITA  

     
    PAPER-Communication Theory

      Vol:
    E77-B No:4
      Page(s):
    485-493

    This paper studies the performance of a dialogue communication system which consists of two stations over a half-duplex line. When a station seizes the right to send its packets, it can consecutively transmits k packets. We analyze the transmission time of a message and the throughput performances of Stop-and-Wait, Go-back-N and Selective-Repeat protocols for the half-duplex line transmission system. Based on the analytical and numerical results, we clarify the influences of the switching and the thinking times, which exist in half-duplex line system, on the throughput performance, and give the optimal k which makes the throughput to become maximum. It is observed that the throughput performances are greatly influenced not only by the switching and thinking times but also by the average message length.

  • A Logical Model for Plan Recognition and Belief Revision

    Katashi NAGAO  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    209-217

    In this paper, we present a unified model for dialogue understanding involving various sorts of ambiguities, such as lexical, syntactic, semantic, and plan ambiguities. This model is able to estimate and revise the most preferable interpretation of utterances as a dialogue progresses. The model's features successfully capture the dynamic nature of dialogue management. The model consists of two main portions: (1) an extension of first-order logic for maintaining multiple interpretations of ambiguous utterances in a dialogue; (2) a device which estimates and revises the most preferable interpretation from among these multiple interpretations. Since the model is logic-based, it provides a good basis for formulating a rational justification of its current interpretation, which is one of the most desirable aspects in generating helpful responses. These features (contained in our model) are extremely useful for interactive dialogue management.

  • Multiple World Representation of Mental States for Dialogue Processing

    Toru SUGIMOTO  Akinori YONEZAWA  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    192-208

    As a general basis for constructing a cooperative and flexible dialogue system, we are interested in modelling the inference process of an agent who participates in a dialogue. For this purpose, it is natural and powerful to model it in his general cognitive framework for problem solving. This paper presents such a framework. In this framework, we represent agent's mental states in the form called Mental World Structure, which consists of multiple mental worlds. Each mental world is a set of mental propositions and corresponds to one modal context, that is, a specific point of view. Modalities in an agent's mental states are represented by path expressions, which are first class citizens of the system and can be composed each other to make up composite modalities. With Mental World Structure, we can handle modalities more flexibly than ordinary modal logics, situation theory and other representation systems. We incorporate smoothly into the structure three basic inference procedures, that is, deduction, abduction and truth maintenance. Precise definitions of the structure and the inference procedures are given. Furthermore, we explain as examples, several cooperative dialogues in our framework.

  • Development of an Environmental ICAI System for English Conversation Learning

    Ryo OKAMOTO  Yoneo YANO  

     
    PAPER

      Vol:
    E77-D No:1
      Page(s):
    118-128

    This paper describes the development of an environmental ICAI system for English conversation learning, which is equipped with a simulation-based learning environment and an advisor function. Recently there have been various educational applications or tools for adult second language education, where the learning target is the acquisition of formal knowledge of a language. When considering the implementation of a practical CAI system, methods for developing communicative competence in learners are required. Although there are a number of ICAI systems for conversation learning, often the methodologies which they apply are not completely suitable for the acquisition of the required fundamental knowledge. Our system, based on the architecture of environmental CAI, enhances communication skill acquisition. The system has a learning environment with the following features: (1) A simulation of language activities, implemented in the role-playing game style, which helps to promote a learner's motivation. (2) Educational behavior of the system is varied through the modification of the learning environment and changes in the simulation progress and control commands. (3) An induction strategy, which can cause learners to fail to achieve a learning target, is executed by an advisor mechanism. The system is a prototype architecture for application in environmental ICAI systems for simulation based learning. We believe that the architecture of this system is an efficient framework for linguistic education.

  • An Implementation of a Dialogue Processing System COKIS Using a Corpus Extracted Knowledge

    Kotaro MATSUSAKA  Akira KUMAMOTO  

     
    LETTER

      Vol:
    E76-A No:7
      Page(s):
    1174-1176

    This system called COKIS automatically extracts knowledge about C functions from the UNIX on-line manual by using its description paragraph and the user can interactively inquire to the system in order to know about UNIX C functions. The idea is motivated on the one side to free users from being involved in an exhaustive knowledge acquisition in the past, and to examine problems in understanding knowledge itself on the other. We propose Memory Processor which is implemented to realize extracting knowledges from corpus and processing dialogues in the inquiry system at the same modules.

  • A Linguistic Procedure for an Extension Number Guidance System

    Naomi INOUE  Izuru NOGAITO  Masahiko TAKAHASHI  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    106-111

    This paper describes the linguistic procedure of our speech dialogue system. The procedure is composed of two processes, syntactic analysis using a finite state network, and discourse analysis using a plan recognition model. The finite state network is compiled from regular grammar. The regular grammar is described in order to accept sentences with various styles, for example ellipsis and inversion. The regular grammar is automatically generated from the skeleton of the grammar. The discourse analysis module understands the utterance, generates the next question for users and also predicts words which will be in the next utterance. For an extension number guidance task, we obtained correct recognition results for 93% of input sentences without word prediction and for 98% if prediction results include proper words.

  • Predicting the Next Utterance Linguistic Expressions Using Contextual Information

    Hitoshi IIDA  Takayuhi YAMAOKA  Hidekazu ARITA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    62-73

    A context-sensitive method to predict linguistic expressions in the next utterance in inquiry dialogues is proposed. First, information of the next utterance, the utterance type, the main action and the discourse entities, can be grasped using a dialogue interpretation model. Secondly, focusing in particular on dialogue situations in context, a domain-dependent knowledge-base for literal usage of both noun phrases and verb phrases is developed. Finally, a strategy to make a set of linguistic expressions which are derived from semantic concepts consisting of appropriate expressions can be used to select the correct candidate from the speech recognition output. In this paper, some of the processes are particularly examined in which sets of polite expressions, vocatives, compound nominal phrases, verbal phrases, and intention expressions, which are common in telephone inquiry dialogue, are created.

  • Prospects for Advanced Spoken Dialogue Processing

    Hitoshi IIDA  

     
    INVITED PAPER

      Vol:
    E76-D No:1
      Page(s):
    2-8

    This paper discusses the problems facing spoken dialogue processing and the prospects for future improvements. Research on elemental topics like speech recognition, speech synthesis and language understanding has led to improvements in the accuracy and sophistication of each area of study. First, in order to handle a spoken dialogue, we show the necessity for information exchanges between each area of processing as seen through the analysis of spoken dialogue characteristics. Second, we discuss how to integrate those processes and show that the memory-basad approach to spontaneous speech interpretation offers a solution to the problem of process integration. The key to this is setting up a mental state affected by both speech and linguistic information. Finally, we discuss how those mental states are structured and a method for constructing them.

21-40hit(46hit)