The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Natthawut KERTKEIDKACHORN(4hit)

1-4hit
  • Competent Triple Identification for Knowledge Graph Completion under the Open-World Assumption

    Esrat FARJANA  Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2021/12/02
      Vol:
    E105-D No:3
      Page(s):
    646-655

    The usefulness and usability of existing knowledge graphs (KGs) are mostly limited because of the incompleteness of knowledge compared to the growing number of facts about the real world. Most existing ontology-based KG completion methods are based on the closed-world assumption, where KGs are fixed. In these methods, entities and relations are defined, and new entity information cannot be easily added. In contrast, in open-world assumptions, entities and relations are not previously defined. Thus there is a vast scope to find new entity information. Despite this, knowledge acquisition under the open-world assumption is challenging because most available knowledge is in a noisy unstructured text format. Nevertheless, Open Information Extraction (OpenIE) systems can extract triples, namely (head text; relation text; tail text), from raw text without any prespecified vocabulary. Such triples contain noisy information that is not essential for KGs. Therefore, to use such triples for the KG completion task, it is necessary to identify competent triples for KGs from the extracted triple set. Here, competent triples are the triples that can contribute to add new information to the existing KGs. In this paper, we propose the Competent Triple Identification (CTID) model for KGs. We also propose two types of feature, namely syntax- and semantic-based features, to identify competent triples from a triple set extracted by a state-of-the-art OpenIE system. We investigate both types of feature and test their effectiveness. It is found that the performance of the proposed features is about 20% better compared to that of the ReVerb system in identifying competent triples.

  • An Automatic Knowledge Graph Creation Framework from Natural Language Text

    Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER

      Pubricized:
    2017/09/15
      Vol:
    E101-D No:1
      Page(s):
    90-98

    Knowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.

  • A Heuristic Expansion Framework for Mapping Instances to Linked Open Data

    Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2016/04/05
      Vol:
    E99-D No:7
      Page(s):
    1786-1795

    Mapping instances to the Linked Open Data (LOD) cloud plays an important role for enriching information of instances, since the LOD cloud contains abundant amounts of interlinked instances describing the instances. Consequently, many techniques have been introduced for mapping instances to a LOD data set; however, most of them merely focus on tackling with the problem of heterogeneity. Unfortunately, the problem of the large number of LOD data sets has yet to be addressed. Owing to the number of LOD data sets, mapping an instance to a LOD data set is not sufficient because an identical instance might not exist in that data set. In this article, we therefore introduce a heuristic expansion based framework for mapping instances to LOD data sets. The key idea of the framework is to gradually expand the search space from one data set to another data set in order to discover identical instances. In experiments, the framework could successfully map instances to the LOD data sets by increasing the coverage to 90.36%. Experimental results also indicate that the heuristic function in the framework could efficiently limit the expansion space to a reasonable space. Based upon the limited expansion space, the framework could effectively reduce the number of candidate pairs to 9.73% of the baseline without affecting any performances.

  • Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge

    Rungsiman NARARATWONG  Natthawut KERTKEIDKACHORN  Nagul COOHAROJANANONE  Hitoshi OKADA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2018/09/07
      Vol:
    E101-D No:12
      Page(s):
    3218-3225

    Word boundary ambiguity in word segmentation has long been a fundamental challenge within Thai language processing. The Conditional Random Fields (CRF) model is among the best-known methods to have achieved remarkably accurate segmentation. Nevertheless, current advancements appear to have left the problem of compound words unaccounted for. Compound words lose their meaning or context once segmented. Hence, we introduce a dictionary-based word-merging algorithm, which merges all kinds of compound words. Our evaluation shows that the algorithm can accomplish a high-accuracy of word segmentation, with compound words being preserved. Moreover, it can also restore some incorrectly segmented words. Another problem involving a different word-chunking approach is sentence boundary ambiguity. In tackling the problem, utilizing the part of speech (POS) of a segmented word has been found previously to help boost the accuracy of CRF-based sentence segmentation. However, not all segmented words can be tagged. Thus, we propose a POS-based word-splitting algorithm, which splits words in order to increase POS tags. We found that with more identifiable POS tags, the CRF model performs better in segmenting sentences. To demonstrate the contributions of both methods, we experimented with three of their applications. With the word merging algorithm, we found that intact compound words in the product of topic extraction can help to preserve their intended meanings, offering more precise information for human interpretation. The algorithm, together with the POS-based word-splitting algorithm, can also be used to amend word-level Thai-English translations. In addition, the word-splitting algorithm improves sentence segmentation, thus enhancing text summarization.