The search functionality is under construction.

Keyword Search Result

[Keyword] knowledge extraction(8hit)

1-8hit
  • Competent Triple Identification for Knowledge Graph Completion under the Open-World Assumption

    Esrat FARJANA  Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2021/12/02
      Vol:
    E105-D No:3
      Page(s):
    646-655

    The usefulness and usability of existing knowledge graphs (KGs) are mostly limited because of the incompleteness of knowledge compared to the growing number of facts about the real world. Most existing ontology-based KG completion methods are based on the closed-world assumption, where KGs are fixed. In these methods, entities and relations are defined, and new entity information cannot be easily added. In contrast, in open-world assumptions, entities and relations are not previously defined. Thus there is a vast scope to find new entity information. Despite this, knowledge acquisition under the open-world assumption is challenging because most available knowledge is in a noisy unstructured text format. Nevertheless, Open Information Extraction (OpenIE) systems can extract triples, namely (head text; relation text; tail text), from raw text without any prespecified vocabulary. Such triples contain noisy information that is not essential for KGs. Therefore, to use such triples for the KG completion task, it is necessary to identify competent triples for KGs from the extracted triple set. Here, competent triples are the triples that can contribute to add new information to the existing KGs. In this paper, we propose the Competent Triple Identification (CTID) model for KGs. We also propose two types of feature, namely syntax- and semantic-based features, to identify competent triples from a triple set extracted by a state-of-the-art OpenIE system. We investigate both types of feature and test their effectiveness. It is found that the performance of the proposed features is about 20% better compared to that of the ReVerb system in identifying competent triples.

  • Character Feature Learning for Named Entity Recognition

    Ping ZENG  Qingping TAN  Haoyu ZHANG  Xiankai MENG  Zhuo ZHANG  Jianjun XU  Yan LEI  

     
    LETTER

      Pubricized:
    2018/04/20
      Vol:
    E101-D No:7
      Page(s):
    1811-1815

    The deep neural named entity recognition model automatically learns and extracts the features of entities and solves the problem of the traditional model relying heavily on complex feature engineering and obscure professional knowledge. This issue has become a hot topic in recent years. Existing deep neural models only involve simple character learning and extraction methods, which limit their capability. To further explore the performance of deep neural models, we propose two character feature learning models based on convolution neural network and long short-term memory network. These two models consider the local semantic and position features of word characters. Experiments conducted on the CoNLL-2003 dataset show that the proposed models outperform traditional ones and demonstrate excellent performance.

  • A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools Open Access

    Ponrudee NETISOPAKUL  Gerhard WOHLGENANNT  

     
    SURVEY PAPER

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    986-1002

    As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task - which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community.

  • An Automatic Knowledge Graph Creation Framework from Natural Language Text

    Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER

      Pubricized:
    2017/09/15
      Vol:
    E101-D No:1
      Page(s):
    90-98

    Knowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.

  • Triple Prediction from Texts by Using Distributed Representations of Words

    Takuma EBISU  Ryutaro ICHISE  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/09/12
      Vol:
    E100-D No:12
      Page(s):
    3001-3009

    Knowledge graphs have been shown to be useful to many tasks in artificial intelligence. Triples of knowledge graphs are traditionally structured by human editors or extracted from semi-structured information; however, editing is expensive, and semi-structured information is not common. On the other hand, most such information is stored as text. Hence, it is necessary to develop a method that can extract knowledge from texts and then construct or populate a knowledge graph; this has been attempted in various ways. Currently, there are two approaches to constructing a knowledge graph. One is open information extraction (Open IE), and the other is knowledge graph embedding; however, neither is without problems. Stanford Open IE, the current best such system, requires labeled sentences as training data, and knowledge graph embedding systems require numerous triples. Recently, distributed representations of words have become a hot topic in the field of natural language processing, since this approach does not require labeled data for training. These require only plain text, but Mikolov showed that it can perform well with the word analogy task, answering questions such as, “a is to b as c is to __?.” This can be considered as a knowledge extraction task from a text for finding the missing entity of a triple. However, the accuracy is not sufficiently high when applied in a straightforward manner to relations in knowledge graphs, since the method uses only one triple as a positive example. In this paper, we analyze why distributed representations perform such tasks well; we also propose a new method for extracting knowledge from texts that requires much less annotated data. Experiments show that the proposed method achieves considerable improvement compared with the baseline; in particular, the improvement in HITS@10 was more than doubled for some relations.

  • Assigning Polarity to Causal Information in Financial Articles on Business Performance of Companies

    Hiroyuki SAKAI  Shigeru MASUYAMA  

     
    PAPER-Document Analysis

      Vol:
    E92-D No:12
      Page(s):
    2341-2350

    We propose a method of assigning polarity to causal information extracted from Japanese financial articles concerning business performance of companies. Our method assigns polarity (positive or negative) to causal information in accordance with business performance, e.g. "zidousya no uriage ga koutyou: (Sales of cars are good)" (The polarity positive is assigned in this example). We may use causal expressions assigned polarity by our method, e.g., to analyze content of articles concerning business performance circumstantially. First, our method classifies articles concerning business performance into positive articles and negative articles. Using them, our method assigns polarity (positive or negative) to causal information extracted from the set of articles concerning business performance. Although our method needs training dataset for classifying articles concerning business performance into positive and negative ones, our method does not need a training dataset for assigning polarity to causal information. Hence, even if causal information not appearing in the training dataset for classifying articles concerning business performance into positive and negative ones exist, our method is able to assign it polarity by using statistical information of this classified sets of articles. We evaluated our method and confirmed that it attained 74.4% precision and 50.4% recall of assigning polarity positive, and 76.8% precision and 61.5% recall of assigning polarity negative, respectively.

  • Cause Information Extraction from Financial Articles Concerning Business Performance

    Hiroyuki SAKAI  Shigeru MASUYAMA  

     
    PAPER-Knowledge Engineering

      Vol:
    E91-D No:4
      Page(s):
    959-968

    We propose a method of extracting cause information from Japanese financial articles concerning business performance. Our method acquires cause information, e.g. "(zidousya no uriage ga koutyou: Sales of cars were good)". Cause information is useful for investors in selecting companies to invest. Our method extracts cause information as a form of causal expression by using statistical information and initial clue expressions automatically. Our method can extract causal expressions without predetermined patterns or complex rules given by hand, and is expected to be applied to other tasks for acquiring phrases that have a particular meaning not limited to cause information. We compared our method with our previous one originally proposed for extracting phrases concerning traffic accident causes and experimental results showed that our new method outperforms our previous one.

  • Related Word Lists Effective in Creativity Support

    Eiko YAMAMOTO  Hitoshi ISAHARA  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1509-1515

    Expansion of imagination is crucial for lively creativity. However, such expansion is sometimes rather difficult and an environment which supports creativity is required. Because people can attain higher creativity by using words with a thematic relation rather than words with a taxonomical relation, we tried to extract word lists having thematic relations among words. We first extracted word lists from domain specific documents by utilizing inclusive relations between words based on a modifiee/modifier relationship in documents. Next, from the extracted word lists, we removed the word lists having taxonomical relations so as to obtain only word lists having thematic relations. Finally, based on the assumption what kind of knowledge a person can associate when he/she looks at a set of words correlates with how the word set is effective in creativity support, we examined whether the word lists direct us to informative pages on the Web for verifying the availability of our extracted word lists.