The search functionality is under construction.

Author Search Result

[Author] Hae-Chang RIM(16hit)

1-16hit
  • A Probabilistic Feature-Based Parsing Model for Head-Final Languages

    So-Young PARK  Yong-Jae KWAK  Joon-Ho LIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:12
      Page(s):
    2893-2897

    In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.

  • Improving Parsing Performance Using Corpus-Based Temporal Expression Analysis

    Juntae YOON  Seonho KIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:12
      Page(s):
    2898-2902

    This paper presents a method for improving the performance of syntactic analysis by using accurate temporal expression processing. Temporal expression causes parsing errors due to its syntactic duality, but its resolution is not trivial since the syntactic role of temporal expression is understandable in the context. In our work, syntactic functions of temporal words are decisively identified based on local contexts of individual temporal words acquired from a large corpus, which are represented by a finite state method. Experimental results show how the proposed method, incorporated with parsing, improves the accuracy and efficiency of the syntactic analysis.

  • Minimizing Human Intervention for Constructing Korean Part-of-Speech Tagged Corpus

    Do-Gil LEE  Gumwon HONG  Seok Kee LEE  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E93-D No:8
      Page(s):
    2336-2338

    The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.

  • Naïve Probabilistic Shift-Reduce Parsing Model Using Functional Word Based Context for Agglutinative Languages

    Yong-Jae KWAK  So-Young PARK  Joon-Ho LIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:9
      Page(s):
    2286-2289

    In this paper, we propose a naïve probabilistic shift-reduce parsing model which can use contextual information more flexibly than the previous probabilistic GLR parsing models, and utilize the characteristics of agglutinative language in which the functional words are highly developed. Experimental results on Korean have shown that our model using the proposed contextual information improves the parsing accuracy more effectively than the previous models. Moreover, it is compact in model size, and is robust with a small training set.

  • A Definitional Question Answering System Based on Phrase Extraction Using Syntactic Patterns

    Kyoung-Soo HAN  Young-In SONG  Sang-Bum KIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E89-D No:4
      Page(s):
    1601-1605

    We propose a definitional question answering system that extracts phrases using syntactic patterns which are easily constructed manually and can reduce the coverage problem. Experimental results show that our phrase extraction system outperforms a sentence extraction system, especially for selecting concise answers, in terms of recall and precision, and indicate that the proper text unit of answer candidates and the final answer has a significant effect on the system performance.

  • Three-Phase Text Error Correction Model for Korean SMS Messages

    Jeunghyun BYUN  So-Young PARK  Seung-Wook LEE  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E92-D No:5
      Page(s):
    1213-1217

    In this paper, we propose a three-phase text error correction model consisting of a word spacing error correction phase, a syllable-based spelling error correction phase, and a word-based spelling error correction phase. In order to reduce the text error correction complexity, the proposed model corrects text errors step by step. With the aim of correcting word spacing errors, spelling errors, and mixed errors in SMS messages, the proposed model tries to separately manage the word spacing error correction phase and the spelling error correction phase. For the purpose of utilizing both the syllable-based approach covering various errors and the word-based approach correcting some specific errors accurately, the proposed model subdivides the spelling error correction phase into the syllable-based phase and the word-based phase. Experimental results show that the proposed model can improve the performance by solving the text error correction problem based on the divide-and-conquer strategy.

  • Utilizing the Web for Automatic Word Spacing

    Gumwon HONG  Jeong-Hoon LEE  Young-In SONG  Do-Gil LEE  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E92-D No:12
      Page(s):
    2553-2556

    This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

  • Topic Document Model Approach for Naive Bayes Text Classification

    Sang-Bum KIM  Hae-Chang RIM  Jin-Dong KIM  

     
    LETTER-Natural Language Processing

      Vol:
    E88-D No:5
      Page(s):
    1091-1094

    The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.

  • Semantic Classification of Bio-Entities Incorporating Predicate-Argument Features

    Kyung-Mi PARK  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E91-D No:4
      Page(s):
    1211-1214

    In this paper, we propose new external context features for the semantic classification of bio-entities. In the previous approaches, the words located on the left or the right context of bio-entities are frequently used as the external context features. However, in our prior experiments, the external contexts in a flat representation did not improve the performance. In this study, we incorporate predicate-argument features into training the ME-based classifier. Through parsing and argument identification, we recognize biomedical verbs that have argument relations with the constituents including a bio-entity, and then use the predicate-argument structures as the external context features. The extraction of predicate-argument features can be done by performing two identification tasks: the biomedically salient word identification which determines whether a word is a biomedically salient word or not, and the target verb identification which identifies biomedical verbs that have argument relations with the constituents including a bio-entity. Experiments show that the performance of semantic classification in the bio domain can be improved by utilizing such predicate-argument features.

  • Incorporating Frame Information to Semantic Role Labeling

    Joo-Young LEE  Young-In SONG  Hae-Chang RIM  Kyoung-Soo HAN  

     
    LETTER-Natural Language Processing

      Vol:
    E93-D No:1
      Page(s):
    201-204

    In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.

  • Utilizing Global Syntactic Tree Features for Phrase Reordering

    Yeon-Soo LEE  Hyoung-Gyu LEE  Hae-Chang RIM  Young-Sook HWANG  

     
    LETTER-Natural Language Processing

      Vol:
    E97-D No:6
      Page(s):
    1694-1698

    In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.

  • Automatic Acronym Dictionary Construction Based on Acronym Generation Types

    Yeo-Chan YOON  So-Young PARK  Young-In SONG  Hae-Chang RIM  Dae-Woong RHEE  

     
    LETTER-Natural Language Processing

      Vol:
    E91-D No:5
      Page(s):
    1584-1587

    In this paper, we propose a new model of automatically constructing an acronym dictionary. The proposed model generates possible acronym candidates from a definition, and then verifies each acronym-definition pair with a Naive Bayes classifier based on web documents. In order to achieve high dictionary quality, the proposed model utilizes the characteristics of acronym generation types: a syllable-based generation type, a word-based generation type, and a mixed generation type. Compared with a previous model recognizing an acronym-definition pair in a document, the proposed model verifying a pair in web documents improves approximately 50% recall on obtaining acronym-definition pairs from 314 Korean definitions. Also, the proposed model improves 7.25% F-measure on verifying acronym-definition candidate pairs by utilizing specialized classifiers with the characteristics of acronym generation types.

  • A New Probabilistic Dependency Parsing Model for Head-Final, Free Word Order Languages

    Hoojung CHUNG  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E86-D No:11
      Page(s):
    2490-2493

    We propose a dependency parsing model for head-final, variable word order languages. Based on the observation that each word has its own preference for its modifying distance and the preferred distance varies according to surrounding context of the word, we define a parsing model that can reflect the preference. The experimental result shows that the parser based on our model outperforms other parsers in terms of precision and recall rate.

  • Computing Word Semantic Relatedness for Question Retrieval in Community Question Answering

    Jung-Tae LEE  Young-In SONG  Hae-Chang RIM  

     
    LETTER-Contents Technology and Web Information Systems

      Vol:
    E92-D No:4
      Page(s):
    736-739

    Previous approaches to question retrieval in community-based question answering rely on statistical translation techniques to match users' questions (queries) against collections of previously asked questions. This paper presents a simple but effective method for computing word relatedness to improve question retrieval based on word co-occurrence information directly extracted from question and answer archives. Experimental results show that the proposed approach significantly outperforms translation-based approaches.

  • Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

    Young-In SONG  Kyoung-Soo HAN  So-Young PARK  Sang-Bum KIM  Hae-Chang RIM  

     
    LETTER-Contents Technology and Web Information Systems

      Vol:
    E90-D No:11
      Page(s):
    1873-1876

    In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.

  • Estimating Translation Probabilities Considering Semantic Recoverability of Phrase Retranslation

    Hyoung-Gyu LEE  Min-Jeong KIM  YingXiu QUAN  Hae-Chang RIM  So-Young PARK  

     
    LETTER-Natural Language Processing

      Vol:
    E95-D No:3
      Page(s):
    897-901

    The general method for estimating phrase translation probabilities consists of sequential processes: word alignment, phrase pair extraction, and phrase translation probability calculation. However, during this sequential process, errors may propagate from the word alignment step through the translation probability calculation step. In this paper, we propose a new method for estimating phrase translation probabilities that reduce the effects of error propagation. By considering the semantic recoverability of phrase retranslation, our method identifies incorrect phrase pairs that have propagated from alignment errors. Furthermore, we define retranslation similarity which represents the semantic recoverability of phrase retranslation, and use this when computing translation probabilities. Experimental results show that the proposed phrase translation estimation method effectively prevents a PBSMT system from selecting incorrect phrase pairs, and consistently improves the translation quality in various language pairs.