The search functionality is under construction.

Author Search Result

[Author] Satoshi TSUCHIYA(3hit)

1-3hit
  • A Machine Learning Approach for an Indonesian-English Cross Language Question Answering System

    Ayu PURWARIANTI  Masatoshi TSUCHIYA  Seiichi NAKAGAWA  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:11
      Page(s):
    1841-1852

    We have built a CLQA (Cross Language Question Answering) system for a source language with limited data resources (e.g. Indonesian) using a machine learning approach. The CLQA system consists of four modules: question analyzer, keyword translator, passage retriever and answer finder. We used machine learning in two modules, the question classifier (part of the question analyzer) and the answer finder. In the question classifier, we classify the EAT (Expected Answer Type) of a question by using SVM (Support Vector Machine) method. Features for the classification module are basically the output of our shallow question parsing module. To improve the classification score, we use statistical information extracted from our Indonesian corpus. In the answer finder module, using an approach different from the common approach in which answer is located by matching the named entity of the word corpus with the EAT of question, we locate the answer by text chunking the word corpus. The features for the SVM based text chunking process consist of question features, word corpus features and similarity scores between the word corpus and the question keyword. In this way, we eliminate the named entity tagging process for the target document. As for the keyword translator module, we use an Indonesian-English dictionary to translate Indonesian keywords into English. We also use some simple patterns to transform some borrowed English words. The keywords are then combined in boolean queries in order to retrieve relevant passages using IDF scores. We first conducted an experiment using 2,837 questions (about 10% are used as the test data) obtained from 18 Indonesian college students. We next conducted a similar experiment using the NTCIR (NII Test Collection for IR Systems) 2005 CLQA task by translating the English questions into Indonesian. Compared to the Japanese-English and Chinese-English CLQA results in the NTCIR 2005, we found that our system is superior to others except for one system that uses a high data resource employing 3 dictionaries. Further, a rough comparison with two other Indonesian-English CLQA systems revealed that our system achieved higher accuracy score.

  • Synthesis of Configuration Change Procedure Using Model Finder

    Shinji KIKUCHI  Satoshi TSUCHIYA  Kunihiko HIRAISHI  

     
    PAPER-Software System

      Vol:
    E96-D No:8
      Page(s):
    1696-1706

    Managing the configurations of complex systems consisting of various components requires the combined efforts by multiple domain experts. These experts have extensive knowledge about different components in the system they need to manage but little understanding of the issues outside their individual areas of expertise. As a result, the configuration constraints, changes, and procedures specified by those involved in the management of a complex system are often interrelated with one another without being noticed, and their integration into a coherent procedure for configuration represents a major challenge. The method of synthesizing the configuration procedure introduced in this paper addresses this challenge using a combination of formal specification and model finding techniques. We express the knowledge on system management with this method, which is provided by domain experts as first-order logic formulas in the Alloy specification language, and combine it with system-configuration information and the resulting specification. We then employ the Alloy Analyzer to find a system model that satisfies all the formulas in this specification. The model obtained corresponds to a procedure for system configurations that satisfies all expert-specified constraints. In order to reduce the resources needed in the procedure synthesis, we reduce the length of procedures to be synthesized by defining and using intermediate goal states to divide operation procedures into shorter steps. Finally, we evaluate our method through a case study on a procedure to consolidate virtual machines.

  • Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity

    Welly NAPTALI  Masatoshi TSUCHIYA  Seiichi NAKAGAWA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:9
      Page(s):
    2308-2317

    Out-of-vocabulary (OOV) words create serious problems for automatic speech recognition (ASR) systems. Not only are they miss-recognized as in-vocabulary (IV) words with similar phonetics, but the error also causes further errors in nearby words. Language models (LMs) for most open vocabulary ASR systems treat OOV words as a single entity, ignoring the linguistic information. In this paper we present a class-based n-gram LM that is able to deal with OOV words by treating each of them individually without retraining all the LM parameters. OOV words are assigned to IV classes consisting of similar semantic meanings for IV words. The World Wide Web is used to acquire additional data for finding the relation between the OOV and IV words. An evaluation based on adjusted perplexity and word-error-rate was carried out on the Wall Street Journal corpus. The result suggests the preference of the use of multiple classes for OOV words, instead of one unknown class.