The search functionality is under construction.

Author Search Result

[Author] Key-Sun CHOI(6hit)

1-6hit
  • Normalizing Syntactic Structures Using Part-of-Speech Tags and Binary Rules

    Seongyong KIM  Kong-Joo LEE  Key-Sun CHOI  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    2049-2056

    We propose a normalization scheme of syntactic structures using a binary phrase structure grammar with composite labels. The normalization adopts binary rules so that the dependency between two sub-trees can be represented in the label of the tree. The label of a tree is composed of two attributes, each of which is extracted from each sub-tree, so that it can represent the compositional information of the tree. The composite label is generated from part-of-speech tags using an automatic labelling algorithm. Since the proposed normalization scheme is binary and uses only part-of-speech information, it can readily be used to compare the results of different syntactic analyses independently of their syntactic description and can be applied to other languages as well. It can also be used for syntactic analysis, which performs higher than the previous syntactic description for Korean corpus. We implement a tool that transforms a syntactic description into normalized one based on this proposed scheme. It can help construct a unified syntactic corpus and extract syntactic information from various types of syntactic corpus in a uniform way.

  • Entity Summarization Based on Entity Grouping in Multilingual Projected Entity Space

    Eun-kyung KIM  Key-Sun CHOI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/06/02
      Vol:
    E100-D No:9
      Page(s):
    2138-2146

    Entity descriptions have been exponentially growing in community-generated knowledge databases, such as DBpedia. However, many of those descriptions are not useful for identifying the underlying characteristics of their corresponding entities because semantically redundant facts or triples are included in the descriptions that represent the connections between entities without any semantic properties. Entity summarization is applied to filter out such non-informative triples and meaning-redundant triples and rank the remaining informative facts within the size of the triples for summarization. This study proposes an entity summarization approach based on pre-grouping the entities that share a set of attributes that can be used to characterize the entities we want to summarize. Entities are first grouped according to projected multilingual categories that provide the multi-angled semantics of each entity into a single entity space. Key facts about the entity are then determined through in-group-based rankings. As a result, our proposed approach produced summary information of significantly better quality (p-value =1.52×10-3 and 2.01×10-3 for the top-10 and -5 summaries, respectively) than the state-of-the-art method that requires additional external resources.

  • Extracting Partial Parsing Rules from Tree-Annotated Corpus: Toward Deterministic Global Parsing

    Myung-Seok CHOI  Kong-Joo LEE  Key-Sun CHOI  Gil Chang KIM  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:6
      Page(s):
    1248-1255

    It is not always possible to find a global parse for an input sentence owing to problems such as errors of a sentence, incompleteness of lexicon and grammar. Partial parsing is an alternative approach to respond to these problems. Partial parsing techniques try to recover syntactic information efficiently and reliably by sacrificing completeness and depth of analysis. One of the difficulties in partial parsing is how the grammar might be automatically extracted. In this paper we present a method of automatically extracting partial parsing rules from a tree-annotated corpus using the decision tree method. Our goal is deterministic global parsing using partial parsing rules, in other words, to extract partial parsing rules with higher accuracy and broader expansion. First, we define a rule template that enables to learn a subtree for a given substring, so that the resultant rules can be more specific and stricter to apply. Second, rule candidates extracted from a training corpus are enriched with contextual and lexical information using the decision tree method and verified through cross-validation. Last, we underspecify non-deterministic rules by merging substructures with ambiguity in those rules. The learned grammar is similar to phrase structure grammar with contextual and lexical information, but allows building structures of depth one or more. Thanks to automatic learning, the partial parsing rules can be consistent and domain-independent. Partial parsing with this grammar processes an input sentence deterministically using longest-match heuristics, and recursively applies rules to an input sentence. The experiments showed that the partial parser using automatically extracted rules is not only accurate and efficient but also achieves reasonable coverage for Korean.

  • Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

    Jong-Hoon OH  Key-Sun CHOI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:7
      Page(s):
    1737-1748

    Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on either a grapheme-based or phoneme-based method. However, transliteration is an orthographical and phonetic converting process. Therefore, both grapheme and phoneme information should be considered in machine transliteration. In this paper, we propose a grapheme and phoneme-based transliteration model and compare it with previous grapheme-based and phoneme-based models using several machine learning techniques. Our method shows about 1378% performance improvement.

  • Improving Distantly Supervised Relation Extraction by Knowledge Base-Driven Zero Subject Resolution

    Eun-kyung KIM  Key-Sun CHOI  

     
    LETTER-Natural Language Processing

      Pubricized:
    2018/07/11
      Vol:
    E101-D No:10
      Page(s):
    2551-2558

    This paper introduces a technique for automatically generating potential training data from sentences in which entity pairs are not apparently presented in a relation extraction. Most previous works on relation extraction by distant supervision ignored cases in which a relationship may be expressed via null-subjects or anaphora. However, natural language text basically has a network structure that is composed of several sentences. If they are closely related, this is not expressed explicitly in the text, which can make relation extraction difficult. This paper describes a new model that augments a paragraph with a “salient entity” that is determined without parsing. The entity can create additional tuple extraction environments as potential subjects in paragraphs. Including the salient entity as part of the sentential input may allow the proposed method to identify relationships that conventional methods cannot identify. This method also has promising potential applicability to languages for which advanced natural language processing tools are lacking.

  • An Alignment Model for Extracting English-Korean Translations of Term Constituents

    Jong-Hoon OH  Key-Sun CHOI  Hitoshi ISAHARA  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:12
      Page(s):
    2972-2980

    Technical terms are linguistic representations of a domain concept, and their constituents are components used to represent the concept. Technical terms are usually multi-word terms and their meanings can be inferred from their constituents. Therefore, term constituents are essential for understanding the designated meaning of technical terms. However, there are several problems in finding the correct meanings of technical terms with their term constituents. First, because a term constituent is usually a morphological unit rather than a conceptual unit in the case of Korean technical terms, we need to first identify conceptual units by chunking term constituents. Second, conceptual units are sometimes homonyms or synonyms. Moreover their meanings show domain dependency. It is therefore necessary to give information about conceptual units and their possible meanings, including homonyms, synonyms, and domain dependency, so that natural language applications can properly handle technical terms. In this paper, we propose a term constituent alignment algorithm that extracts such information from bilingual technical term pairs. Our algorithm recognizes conceptual units and their meanings by finding English term constituents and their corresponding Korean term constituents for given English-Korean term pairs. Our experimental results indicate that this method can effectively find conceptual units and their meanings with about 6% alignment error rate (AER) on manually analyzed experimental data and about 14% AER on automatically analyzed experimental data.