1-8hit |
Harksoo KIM Choong-Nyoung SEON Jungyun SEO
Most of commercial websites provide customers with menu-driven navigation and keyword search. However, these inconvenient interfaces increase the number of mouse clicks and decrease customers' interest in surfing the websites. To resolve the problem, we propose an information retrieval assistant using a natural language interface in online sales domains. The information retrieval assistant has a client-server structure; a system connector and a NLP (natural language processing) server. The NLP server performs a linguistic analysis of users' queries with the help of coordinated NLP agents that are based on shallow NLP techniques. After receiving the results of the linguistic analysis from the NLP server, the system connector interacts with outer information provision systems such as conventional information retrieval systems and relational database management systems according to the analysis results. Owing to the client-server structure, we can easily add other information provision systems to the information retrieval assistant with trivial modifications of the NLP server. In addition, the information retrieval assistant guarantees fast responses because it uses shallow NLP techniques. In the preliminary experiment, as compared to the menu-driven system, we found that the information retrieval assistant could reduce the bothersome tasks such as menu selecting and mouse clicking because it provides a convenient natural language interface.
Donghyun YOO Youngjoong KO Jungyun SEO
In this paper, we propose a deep learning based model for classifying speech-acts using a convolutional neural network (CNN). The model uses some bigram features including parts-of-speech (POS) tags and dependency-relation bigrams, which represent syntactic structural information in utterances. Previous classification approaches using CNN have commonly exploited word embeddings using morpheme unigrams. However, the proposed model first extracts two different bigram features that well reflect the syntactic structure of utterances and then represents them as a vector representation using a word embedding technique. As a result, the proposed model using bigram embeddings achieves an accuracy of 89.05%. Furthermore, the accuracy of this model is relatively 2.8% higher than that of competitive models in previous studies.
Hanmin JUNG Gary Geunbae LEE Won Seug CHOI KyungKoo MIN Jungyun SEO
This paper describes a highly-portable multilingual question answering system on multiple relational databases. We apply techniques which were verified on open-domain text-based question answering, such as semantic category and pattern-based grammars, into natural language interfaces to relational databases. Lexico-semantic pattern (LSP) and multi-level grammars achieve portability of languages, domains, and DB management systems. The LSP-based linguistic processing does not require deep analysis that sacrifices robustness and flexibility, but can handle delicate natural language questions. To maximize portability, we drive three dependency factors into the following two parts: language-dependent part into front linguistic analysis, and domain-dependent and database-dependent parts into backend SQL query generation. We also support session-based dialog by preserving SQL queries created from previous user's question, and then re-generating new SQL query for the successive questions. Experiments with 779 queries generate only constraint-missing errors, which can be easily corrected by adding new terms, of 2.25% for English and 5.67% for Korean.
Youngjoong KO Kono KIM Jungyun SEO
Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.
Sanghwa YUH Kongjoo LEE Jungyun SEO
In this paper, we present a Korean to Chinese/English/Japanese multilingual Machine Translation (MT) system of closed captions for Digital Television (DTV). Preliminary experiments of our closed caption translation with existing base MT systems had shown unsatisfactory result. In order to achieve more accurate translation with the base MT systems, we adopted live resources of multilingual Named Entities and their translingual equivalences from the Web. We also utilize the program information, which the terrestrial broadcasters offer through DTV transport stream, in order to use program specific dictionaries, including the names of characters, locations and organizations. Two more components are adopted for reducing the ambiguities of parsing and word sense disambiguation; sentence simplification for long sentence segmentation and dynamic domain identification for automatic domain dictionary stacking. With these integrated approaches, we could raise the Mean Opinion Score (MOS) of translation accuracy by 0.40 higher than the base MT systems.
Bojun SHIM Youngjoong KO Jungyun SEO
This paper describes a flexible strategy to generate candidate answers for factoid questions in Question Answering (QA) systems. Most QA systems have predefined the conceptual categories for candidate answers. But if the conceptual category of answers to any question is not prepared in the QA system, it is hard to extract correct answers to that question. Therefore, we propose an extraction method for IS-A relation patterns which describe relations between the nominal target concepts of question and candidate answers. The extracted IS-A relation patterns can be used for questions with an unexpected target concept.
Won Seug CHOI Harksoo KIM Jungyun SEO
Analysis of speech acts and discourse structures is essential to a dialogue understanding system because speech acts and discourse structures are closely tied with the speaker's intention. However, it has been difficult to infer a speech act and a discourse structure from a surface utterance because they highly depend on the context of the utterance. We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using a maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from an annotated dialogue corpus. Moreover, the model can analyze speech acts and discourse structures in one framework. In the experiment, the model showed better performance than other previous works.
To implement a fast and reliable question-answering system in Korean, we propose a two-pass answer indexer using co-occurrence information between answer candidates and adjacent content words. The two-pass indexer scans documents twice for obtaining local scores and global scores. Then, the two-pass indexer calculates the degrees of association between answer candidates and co-occurring content words. Using this technique, the proposed QA system shortens the response time and enhances the precision.