The search functionality is under construction.

Author Search Result

[Author] Young-In SONG(6hit)

1-6hit
  • A Definitional Question Answering System Based on Phrase Extraction Using Syntactic Patterns

    Kyoung-Soo HAN  Young-In SONG  Sang-Bum KIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E89-D No:4
      Page(s):
    1601-1605

    We propose a definitional question answering system that extracts phrases using syntactic patterns which are easily constructed manually and can reduce the coverage problem. Experimental results show that our phrase extraction system outperforms a sentence extraction system, especially for selecting concise answers, in terms of recall and precision, and indicate that the proper text unit of answer candidates and the final answer has a significant effect on the system performance.

  • Utilizing the Web for Automatic Word Spacing

    Gumwon HONG  Jeong-Hoon LEE  Young-In SONG  Do-Gil LEE  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E92-D No:12
      Page(s):
    2553-2556

    This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

  • Incorporating Frame Information to Semantic Role Labeling

    Joo-Young LEE  Young-In SONG  Hae-Chang RIM  Kyoung-Soo HAN  

     
    LETTER-Natural Language Processing

      Vol:
    E93-D No:1
      Page(s):
    201-204

    In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.

  • Automatic Acronym Dictionary Construction Based on Acronym Generation Types

    Yeo-Chan YOON  So-Young PARK  Young-In SONG  Hae-Chang RIM  Dae-Woong RHEE  

     
    LETTER-Natural Language Processing

      Vol:
    E91-D No:5
      Page(s):
    1584-1587

    In this paper, we propose a new model of automatically constructing an acronym dictionary. The proposed model generates possible acronym candidates from a definition, and then verifies each acronym-definition pair with a Naive Bayes classifier based on web documents. In order to achieve high dictionary quality, the proposed model utilizes the characteristics of acronym generation types: a syllable-based generation type, a word-based generation type, and a mixed generation type. Compared with a previous model recognizing an acronym-definition pair in a document, the proposed model verifying a pair in web documents improves approximately 50% recall on obtaining acronym-definition pairs from 314 Korean definitions. Also, the proposed model improves 7.25% F-measure on verifying acronym-definition candidate pairs by utilizing specialized classifiers with the characteristics of acronym generation types.

  • Computing Word Semantic Relatedness for Question Retrieval in Community Question Answering

    Jung-Tae LEE  Young-In SONG  Hae-Chang RIM  

     
    LETTER-Contents Technology and Web Information Systems

      Vol:
    E92-D No:4
      Page(s):
    736-739

    Previous approaches to question retrieval in community-based question answering rely on statistical translation techniques to match users' questions (queries) against collections of previously asked questions. This paper presents a simple but effective method for computing word relatedness to improve question retrieval based on word co-occurrence information directly extracted from question and answer archives. Experimental results show that the proposed approach significantly outperforms translation-based approaches.

  • Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

    Young-In SONG  Kyoung-Soo HAN  So-Young PARK  Sang-Bum KIM  Hae-Chang RIM  

     
    LETTER-Contents Technology and Web Information Systems

      Vol:
    E90-D No:11
      Page(s):
    1873-1876

    In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.