Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System

Lasguido NIO; Sakriani SAKTI; Graham NEUBIG; Tomoki TODA; Satoshi NAKAMURA

doi:10.1587/transinf.E97.D.1497

IEICE TRANSACTIONS on Information

Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System

Lasguido NIO, Sakriani SAKTI, Graham NEUBIG, Tomoki TODA, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.

Publication: IEICE TRANSACTIONS on Information Vol.E97-D No.6 pp.1497-1505

Publication Date: 2014/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E97.D.1497

Type of Manuscript: Special Section PAPER (Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application)

Category: Dialog System

Authors

Lasguido NIO
  Nara Institute of Science and Technology
Sakriani SAKTI
  Nara Institute of Science and Technology
Graham NEUBIG
  Nara Institute of Science and Technology
Tomoki TODA
  Nara Institute of Science and Technology
Satoshi NAKAMURA
  Nara Institute of Science and Technology

Keyword

dialog corpora, response generation, example-based dialog modeling, semantic similarity, cosine similarity, machine translation

Cite this

Copy

Lasguido NIO, Sakriani SAKTI, Graham NEUBIG, Tomoki TODA, Satoshi NAKAMURA, "Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System" in IEICE TRANSACTIONS on Information, vol. E97-D, no. 6, pp. 1497-1505, June 2014, doi: 10.1587/transinf.E97.D.1497.
Abstract: This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E97.D.1497/_p

Copy

@ARTICLE{e97-d_6_1497,
author={Lasguido NIO, Sakriani SAKTI, Graham NEUBIG, Tomoki TODA, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System},
year={2014},
volume={E97-D},
number={6},
pages={1497-1505},
abstract={This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.},
keywords={},
doi={10.1587/transinf.E97.D.1497},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System
T2 - IEICE TRANSACTIONS on Information
SP - 1497
EP - 1505
AU - Lasguido NIO
AU - Sakriani SAKTI
AU - Graham NEUBIG
AU - Tomoki TODA
AU - Satoshi NAKAMURA
PY - 2014
DO - 10.1587/transinf.E97.D.1497
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E97-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2014
AB - This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.
ER -

IEICE TRANSACTIONS on Information