The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] So-Young PARK(9hit)

1-9hit
  • Graph-Based Knowledge Consolidation in Ontology Population

    Pum Mo RYU  Myung-Gil JANG  Hyun-Ki KIM  So-Young PARK  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:9
      Page(s):
    2139-2142

    We propose a novel method for knowledge consolidation based on a knowledge graph as a next step in relation extraction from text. The knowledge consolidation method consists of entity consolidation and relation consolidation. During the entity consolidation process, identical entities are found and merged using both name similarity and relation similarity measures. In the relation consolidation process, incorrect relations are removed using cardinality properties, temporal information and relation weight in given graph structure. In our experiment, we could generate compact and clean knowledge graphs where number of entities and relations are reduced by 6.1% and by 17.4% respectively with increasing relation accuracy from 77.0% to 85.5%.

  • Automatic Acronym Dictionary Construction Based on Acronym Generation Types

    Yeo-Chan YOON  So-Young PARK  Young-In SONG  Hae-Chang RIM  Dae-Woong RHEE  

     
    LETTER-Natural Language Processing

      Vol:
    E91-D No:5
      Page(s):
    1584-1587

    In this paper, we propose a new model of automatically constructing an acronym dictionary. The proposed model generates possible acronym candidates from a definition, and then verifies each acronym-definition pair with a Naive Bayes classifier based on web documents. In order to achieve high dictionary quality, the proposed model utilizes the characteristics of acronym generation types: a syllable-based generation type, a word-based generation type, and a mixed generation type. Compared with a previous model recognizing an acronym-definition pair in a document, the proposed model verifying a pair in web documents improves approximately 50% recall on obtaining acronym-definition pairs from 314 Korean definitions. Also, the proposed model improves 7.25% F-measure on verifying acronym-definition candidate pairs by utilizing specialized classifiers with the characteristics of acronym generation types.

  • Detecting Partial and Near Duplication in the Blogosphere

    Yeo-Chan YOON  Myung-Gil JANG  Hyun-Ki KIM  So-Young PARK  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E95-D No:2
      Page(s):
    681-685

    In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.

  • Descriptive Question Answering with Answer Type Independent Features

    Yeo-Chan YOON  Chang-Ki LEE  Hyun-Ki KIM  Myung-Gil JANG  Pum Mo RYU  So-Young PARK  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E95-D No:7
      Page(s):
    2009-2012

    In this paper, we present a supervised learning method to seek out answers to the most frequently asked descriptive questions: reason, method, and definition questions. Most of the previous systems for question answering focus on factoids, lists or definitional questions. However, descriptive questions such as reason questions and method questions are also frequently asked by users. We propose a system for these types of questions. The system conducts an answer search as follows. First, we analyze the user's question and extract search keywords and the expected answer type. Second, information retrieval results are obtained from an existing search engine such as Yahoo or Google . Finally, we rank the results to find snippets containing answers to the questions based on a ranking SVM algorithm. We also propose features to identify snippets containing answers for descriptive questions. The features are adaptable and thus are not dependent on answer type. Experimental results show that the proposed method and features are clearly effective for the task.

  • Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

    Young-In SONG  Kyoung-Soo HAN  So-Young PARK  Sang-Bum KIM  Hae-Chang RIM  

     
    LETTER-Contents Technology and Web Information Systems

      Vol:
    E90-D No:11
      Page(s):
    1873-1876

    In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.

  • Estimating Translation Probabilities Considering Semantic Recoverability of Phrase Retranslation

    Hyoung-Gyu LEE  Min-Jeong KIM  YingXiu QUAN  Hae-Chang RIM  So-Young PARK  

     
    LETTER-Natural Language Processing

      Vol:
    E95-D No:3
      Page(s):
    897-901

    The general method for estimating phrase translation probabilities consists of sequential processes: word alignment, phrase pair extraction, and phrase translation probability calculation. However, during this sequential process, errors may propagate from the word alignment step through the translation probability calculation step. In this paper, we propose a new method for estimating phrase translation probabilities that reduce the effects of error propagation. By considering the semantic recoverability of phrase retranslation, our method identifies incorrect phrase pairs that have propagated from alignment errors. Furthermore, we define retranslation similarity which represents the semantic recoverability of phrase retranslation, and use this when computing translation probabilities. Experimental results show that the proposed phrase translation estimation method effectively prevents a PBSMT system from selecting incorrect phrase pairs, and consistently improves the translation quality in various language pairs.

  • A Probabilistic Feature-Based Parsing Model for Head-Final Languages

    So-Young PARK  Yong-Jae KWAK  Joon-Ho LIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:12
      Page(s):
    2893-2897

    In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.

  • Naïve Probabilistic Shift-Reduce Parsing Model Using Functional Word Based Context for Agglutinative Languages

    Yong-Jae KWAK  So-Young PARK  Joon-Ho LIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:9
      Page(s):
    2286-2289

    In this paper, we propose a naïve probabilistic shift-reduce parsing model which can use contextual information more flexibly than the previous probabilistic GLR parsing models, and utilize the characteristics of agglutinative language in which the functional words are highly developed. Experimental results on Korean have shown that our model using the proposed contextual information improves the parsing accuracy more effectively than the previous models. Moreover, it is compact in model size, and is robust with a small training set.

  • Three-Phase Text Error Correction Model for Korean SMS Messages

    Jeunghyun BYUN  So-Young PARK  Seung-Wook LEE  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E92-D No:5
      Page(s):
    1213-1217

    In this paper, we propose a three-phase text error correction model consisting of a word spacing error correction phase, a syllable-based spelling error correction phase, and a word-based spelling error correction phase. In order to reduce the text error correction complexity, the proposed model corrects text errors step by step. With the aim of correcting word spacing errors, spelling errors, and mixed errors in SMS messages, the proposed model tries to separately manage the word spacing error correction phase and the spelling error correction phase. For the purpose of utilizing both the syllable-based approach covering various errors and the word-based approach correcting some specific errors accurately, the proposed model subdivides the spelling error correction phase into the syllable-based phase and the word-based phase. Experimental results show that the proposed model can improve the performance by solving the text error correction problem based on the divide-and-conquer strategy.