IEICE global.ieice.org Site

Keyword Search Result

[Keyword] knowledge discovery(10hit)

1-10hit

An Automatic Knowledge Graph Creation Framework from Natural Language Text
Natthawut KERTKEIDKACHORN Ryutaro ICHISE

PAPER

Pubricized:
2017/09/15
Vol:
E101-D No:1
Page(s):
90-98
Knowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.
Handling Dynamic Weights in Weighted Frequent Pattern Mining
Chowdhury Farhan AHMED Syed Khairuzzaman TANBEER Byeong-Soo JEONG Young-Koo LEE

PAPER-Knowledge Discovery and Data Mining

Vol:
E91-D No:11
Page(s):
2578-2588
Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.
A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns
Kazuya HARAGUCHI Mutsunori YAGIURA Endre BOROS Toshihide IBARAKI

PAPER-Algorithm Theory

Vol:
E91-D No:3
Page(s):
781-788
We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.
Acquisition and Maintenance of Knowledge for Online Navigation Suggestions
Juan D. VELASQUEZ Richard WEBER Hiroshi YASUDA Terumasa AOKI

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E88-D No:5
Page(s):
993-1003
The Internet has become an important medium for effective marketing and efficient operations for many institutions. Visitors of a particular web site leave behind valuable information on their preferences, requirements, and demands regarding the offered products and/or services. Understanding these requirements online, i.e., during a particular visit, is both a difficult technical challenge and a tremendous business opportunity. Web sites that can provide effective online navigation suggestions to their visitors can exploit the potential inherent in the data such visits generate every day. However, identifying, collecting, and maintaining the necessary knowledge that navigation suggestions are based on is far from trivial. We propose a methodology for acquiring and maintaining this knowledge efficiently using data mart and web mining technology. Its effectiveness has been shown in an application for a bank's web site.
Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules
Kritsada SRIPHAEW Thanaruk THEERAMUNKONG

PAPER-Databases

Vol:
E87-D No:3
Page(s):
761-770
Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a method to mine a smaller set of generalized closed frequent itemsets instead of mining a large set of conventional generalized frequent itemsets. To this end, we develop two algorithms called SET and cSET for mining generalized frequent itemsets and generalized closed frequent itemsets, respectively. By a number of experiments, the proposed algorithms outperform the previous well-known algorithms in both computational time and memory utilization. Furthermore, the experiments with real datasets indicate that mining generalized closed frequent itemsets gains more merit on computational costs since the number of generalized closed frequent itemsets is much more smaller than the number of generalized frequent itemsets.
CLOCK: Clustering for Common Knowledge Extraction in a Set of Transactions
Sang Hyun OH Won Suk LEE

PAPER-Databases

Vol:
E86-D No:9
Page(s):
1845-1855
Association mining extracts common relationships among a finite number of categorical data objects in a set of transactions. However, if the data objects are not categorical and potentially unlimited, it is impossible to employ the association mining approach. On the other hand, clustering is suitable for modeling a large number of non-categorical data objects as long as there exists a distance measure among them. Although it has been used to classify data objects in a data set into groups of similar objects based on data similarity, it can be used to extract the properties of similar data objects commonly appearing in a set of transactions. In this paper, a new clustering method, CLOCK, is proposed to find common knowledge such as frequent ranges of similar objects in a set of transactions. The common knowledge of data objects in the transactions can be represented by the occurrence frequency of similar data objects in terms of a transaction as well as the common repetitive ratio of similar data objects in each transaction. Furthermore, the proposed method also addresses how to maintain identified common knowledge as a summarized profile. As a result, any data difference between a newly collected transaction and the common knowledge of past transactions can be easily identified.
Discovering Knowledge from Graph Structured Data by Using Refutably Inductive Inference of Formal Graph Systems
Tetsuhiro MIYAHARA Tomoyuki UCHIDA Takayoshi SHOUDAI Tetsuji KUBOYAMA Kenichi TAKAHASHI Hiroaki UEDA

PAPER

Vol:
E84-D No:1
Page(s):
48-56
We present a new method for discovering knowledge from structured data which are represented by graphs in the framework of Inductive Logic Programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. The analyzing system directly manipulating graphs is useful for knowledge discovery. Our method uses Formal Graph System (FGS) as a knowledge representation language for graph structured data. FGS is a kind of logic programming system which directly deals with graphs just like first order terms. And our method employs a refutably inductive inference algorithm as a learning algorithm. A refutably inductive inference algorithm is a special type of inductive inference algorithm with refutability of hypothesis spaces, and is suitable for knowledge discovery. We give a sufficiently large hypothesis space, the set of weakly reducing FGS programs. And we show that this hypothesis space is refutably inferable from complete data. We have designed and implemented a prototype of a knowledge discovery system KD-FGS, which is based on our method and acquires knowledge directly from graph structured data. Finally we discuss the applicability of our method for graph structured data with experimental results on some graph theoretical notions.
Design Aspects of Discovery Systems
Osamu MARUYAMA Satoru MIYANO

INVITED PAPER

Vol:
E83-D No:1
Page(s):
61-70
This paper reviews design aspects of computational discovery systems through the analysis of some successful discovery systems. We first review the concept of viewscope/view on data which provides an interpretation of raw data in a specific domain. Then we relate this concept to the KDD process described by Fayyad et al. (1996) and the developer's role in computational discovery due to Langley (1998). We emphasize that integration of human experts and discovery systems is a crucial problem in designing discovery systems and claim together with the analysis of discovery systems that the concept of viewscope/view gives a way for approaching this problem.
Data Analysis by Positive Decision Trees
Kazuhisa MAKINO Takashi SUDA Hirotaka ONO Toshihide IBARAKI

PAPER-Theoretical Aspects

Vol:
E82-D No:1
Page(s):
76-88
Decision trees are used as a convenient means to explain given positive examples and negative examples, which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data, this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive. ) A motivation of this study comes from an observation that real world data are often positive, and in such cases it is natural to build decision trees which represent positive (i. e. , monotone) discriminant functions. For this, we propose how to modify the existing procedures such as ID3, so that the resulting decision tree represents a positive discriminant function. In this procedure, we add some new data to recover the positivity of data, which the original data had but was lost in the process of decomposing data sets by such methods as ID3. To compare the performance of our method with existing methods, we test (1) positive data, which are randomly generated from a hidden positive Boolean function after adding dummy attributes, and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that, although the sizes of positive decision trees are relatively larger than those without positivity assumption, positive decision trees exhibit higher accuracy and tend to choose correct attributes, on which the hidden positive Boolean function is defined. For the breast cancer data set, we also observe a similar tendency; i. e. , positive decision trees are larger but give higher accuracy.
Association Rule Filter for Data Mining in Call Tracking Data
Kazunori MATSUMOTO Kazuo HASHIMOTO

PAPER-Network Design, Operation, and Management

Vol:
E81-B No:12
Page(s):
2481-2486
Call tracking data contains a calling address, called address, service type, and other useful attributes to predict a customer's calling activity. Call tracking data is becoming a target of data mining for telecommunication carriers. Conventional data-mining programs control the number of association rules found with two types of thresholds (minimum confidence and minimum support), however, often they generate too many association rules because of the wide variety of patterns found in call tracking data. This paper proposes a new method to reduce the number of generated rules. The method proposed tests each generated rule based on Akaike Information Criteria (AIC) without using conventional thresholds. Experiments with artificial call tracking data show the high performance of the proposed method.