The search functionality is under construction.

Keyword Search Result

[Keyword] association rule(20hit)

1-20hit
  • Characterization of Interestingness Measures Using Correlation Analysis and Association Rule Mining

    Rachasak SOMYANONTHANAKUL  Thanaruk THEERAMUNKONG  

     
    PAPER

      Pubricized:
    2020/01/09
      Vol:
    E103-D No:4
      Page(s):
    779-788

    Objective interestingness measures play a vital role in association rule mining of a large-scaled database because they are used for extracting, filtering, and ranking the patterns. In the past, several measures have been proposed but their similarities or relations are not sufficiently explored. This work investigates sixty-one objective interestingness measures on the pattern of A → B, to analyze their similarity and dissimilarity as well as their relationship. Three-probability patterns, P(A), P(B), and P(AB), are enumerated in both linear and exponential scales and each measure's values of those conditions are calculated, forming synthesis data for investigation. The behavior of each measure is explored by pairwise comparison based on these three-probability patterns. The relationship among the sixty-one interestingness measures has been characterized with correlation analysis and association rule mining. In the experiment, relationships are summarized using heat-map and association rule mined. As the result, selection of an appropriate interestingness measure can be realized using the generated heat-map and association rules.

  • An Empirical Study of README contents for JavaScript Packages

    Shohei IKEDA  Akinori IHARA  Raula Gaikovina KULA  Kenichi MATSUMOTO  

     
    PAPER-Software Engineering

      Pubricized:
    2018/10/24
      Vol:
    E102-D No:2
      Page(s):
    280-288

    Contemporary software projects often utilize a README.md to share crucial information such as installation and usage examples related to their software. Furthermore, these files serve as an important source of updated and useful documentation for developers and prospective users of the software. Nonetheless, both novice and seasoned developers are sometimes unsure of what is required for a good README file. To understand the contents of README, we investigate the contents of 43,900 JavaScript packages. Results show that these packages contain common content themes (i.e., ‘usage’, ‘install’ and ‘license’). Furthermore, we find that application-specific packages more frequently included content themes such as ‘options’, while library-based packages more frequently included other specific content themes (i.e., ‘install’ and ‘license’).

  • Cross-Validation-Based Association Rule Prioritization Metric for Software Defect Characterization

    Takashi WATANABE  Akito MONDEN  Zeynep YÜCEL  Yasutaka KAMEI  Shuji MORISAKI  

     
    PAPER-Software Engineering

      Pubricized:
    2018/06/13
      Vol:
    E101-D No:9
      Page(s):
    2269-2278

    Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case.

  • Improvement in Method Verb Recommendation Technique Using Association Rule Mining

    Yuki KASHIWABARA  Takashi ISHIO  Katsuro INOUE  

     
    LETTER-Software Engineering

      Pubricized:
    2015/08/13
      Vol:
    E98-D No:11
      Page(s):
    1982-1985

    In a previous study, we proposed a technique to recommend candidate verbs for a method name so that developers can consistently use various verbs. In this study, we improve the rule extraction technique proposed in this previous study. Moreover, we confirm that the rank of each correct verb recommended by the new technique is higher than that by the previous technique.

  • Hide Association Rules with Fewer Side Effects

    Peng CHENG  Ivan LEE  Jeng-Shyang PAN  Chun-Wei LIN  John F. RODDICK  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2015/07/14
      Vol:
    E98-D No:10
      Page(s):
    1788-1798

    Association rule mining is a powerful data mining tool, and it can be used to discover unknown patterns from large volumes of data. However, people often have to face the risk of disclosing sensitive information when data is shared with different organizations. The association rule mining techniques may be improperly used to find sensitive patterns which the owner is unwilling to disclose. One of the great challenges in association rule mining is how to protect the confidentiality of sensitive patterns when data is released. Association rule hiding refers to sanitize a database so that certain sensitive association rules cannot be mined out in the released database. In this study, we proposed a new method which hides sensitive rules by removing some items in a database to reduce the support or confidence levels of sensitive rules below specified thresholds. Based on the information of positive border rules and negative border rules contained in transactions, the proposed method chooses suitable candidates for modification aimed at reducing the side effects and the data distortion degree. Comparative experiments on real datasets and synthetic datasets demonstrate that the proposed method can hide sensitive rules with much fewer side effects and database modifications.

  • A Novel Statistical Approach to Detect Card Frauds Using Transaction Patterns

    Chae Chang LEE  Ji Won YOON  

     
    PAPER-Information Network

      Vol:
    E98-D No:3
      Page(s):
    649-660

    In this paper, we present new methods for learning the individual patterns of a card user's transaction amount and the region in which he or she uses the card, for a given period, and for determining whether the specified transaction is allowable in accordance with these learned user transaction patterns. Then, we classify legitimate transactions and fraudulent transactions by setting thresholds based on the learned individual patterns.

  • Method Verb Recommendation Using Association Rule Mining in a Set of Existing Projects

    Yuki KASHIWABARA  Takashi ISHIO  Hideaki HATA  Katsuro INOUE  

     
    PAPER-Software Engineering

      Pubricized:
    2014/12/16
      Vol:
    E98-D No:3
      Page(s):
    627-636

    It is well-known that program readability is important for maintenance tasks. Method names are important identifiers for program readability because they are used for understanding the behavior of methods without reading a part of the program. Although developers can create a method name by arbitrarily choosing a verb and objects, the names are expected to represent the behavior consistently. However, it is not easy for developers to choose verbs and objects consistently since each developer may have a different notion of a suitable lexicon for method names. In this paper, we propose a technique to recommend candidate verbs for a method name so that developers can use various verbs consistently. We recommend candidate verbs likely to be used as a part of a method name, using association rules extracted from existing methods. To evaluate our technique, we have extracted rules from 445 open source projects written in Java and confirmed the accuracy of our approach by applying the extracted rules to several open source applications. As a result, we found that 84.9% of the considered methods in four projects are recommended the existing verb. Moreover, we found that 73.2% of the actual renamed methods in six projects are recommended the correct verb.

  • An Approach to Extract Informative Rules for Web Page Recommendation by Genetic Programming

    Jaekwang KIM  KwangHo YOON  Jee-Hyong LEE  

     
    PAPER

      Vol:
    E95-B No:5
      Page(s):
    1558-1565

    Clickstreams in users' navigation logs have various data which are related to users' web surfing. Those are visit counts, stay times, product types, etc. When we observe these data, we can divide clickstreams into sub-clickstreams so that the pages in a sub-clickstream share more contexts with each other than with the pages in other sub-clickstreams. In this paper, we propose a method which extracts more informative rules from clickstreams for web page recommendation based on genetic programming and association rules. First, we split clickstreams into sub-clickstreams by contexts for generating more informative rules. In order to split clickstreams in consideration of context, we extract six features from users' navigation logs. A set of split rules is generated by combining those features through genetic programming, and then informative rules for recommendation are extracted with the association rule mining algorithm. Through experiments, we verify that the proposed method is more effective than the other methods in various conditions.

  • An Association Rule Based Grid Resource Discovery Method

    Yuan LIN  Siwei LUO  Guohao LU  Zhe WANG  

     
    LETTER-Computer System

      Vol:
    E94-D No:4
      Page(s):
    913-916

    There are a great amount of various resources described in many different ways for service oriented grid environment, while traditional grid resource discovery methods could not fit more complex future grid system. Therefore, this paper proposes a novel grid resource discovery method based on association rule hypergraph partitioning algorithm which analyzes user behavior in history transaction records to provide personality service for user. And this resource discovery method gives a new way to improve resource retrieval and management in grid research.

  • News Relation Discovery Based on Association Rule Mining with Combining Factors

    Nichnan KITTIPHATTANABAWON  Thanaruk THEERAMUNKONG  Ekawit NANTAJEEWARAWAT  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    404-415

    Recently, to track and relate news documents from several sources, association rule mining has been applied due to its performance and scalability. This paper presents an empirical investigation on how term representation basis, term weighting, and association measure affects the quality of relations discovered among news documents. Twenty four combinations initiated by two term representation bases, four term weightings, and three association measures are explored with their results compared to human judgment of three-level relations: completely related, somehow related, and unrelated relations. The performance evaluation is conducted by comparing the top-k results of each combination to those of the others using so-called rank-order mismatch (ROM). The experimental results indicate that a combination of bigram (BG), term frequency with inverse document frequency (TFIDF) and confidence (CONF), as well as a combination of BG, TFIDF and conviction (CONV), achieves the best performance to find the related documents by placing them in upper ranks with 0.41% ROM on top-50 mined relations. However, a combination of unigram (UG), TFIDF and lift (LIFT) performs the best by locating irrelevant relations in lower ranks (top-1100) with 9.63% ROM. A detailed analysis on the number of the three-level relations with regard to their rankings is also performed in order to examine the characteristic of the resultant relations. Finally, a discussion and an error analysis are given.

  • Privacy Preserving Association Rule Mining Revisited: Privacy Enhancement and Resources Efficiency

    Abedelaziz MOHAISEN  Nam-Su JHO  Dowon HONG  DaeHun NYANG  

     
    PAPER-Data Mining

      Vol:
    E93-D No:2
      Page(s):
    315-325

    Privacy preserving association rule mining algorithms have been designed for discovering the relations between variables in data while maintaining the data privacy. In this article we revise one of the recently introduced schemes for association rule mining using fake transactions (fs). In particular, our analysis shows that the fs scheme has exhaustive storage and high computation requirements for guaranteeing a reasonable level of privacy. We introduce a realistic definition of privacy that benefits from the average case privacy and motivates the study of a weakness in the structure of fs by fake transactions filtering. In order to overcome this problem, we improve the fs scheme by presenting a hybrid scheme that considers both privacy and resources as two concurrent guidelines. Analytical and empirical results show the efficiency and applicability of our proposed scheme.

  • Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database

    Junbo CHEN  Bo ZHOU  Xinyu WANG  Yiqun DING  Lu CHEN  

     
    PAPER-Data Mining

      Vol:
    E92-D No:8
      Page(s):
    1523-1533

    Frequent Itemsets(FI) mining is a popular and important first step in analyzing datasets across a broad range of applications. There are two main problems with the traditional approach for finding frequent itemsets. Firstly, it may often derive an undesirably huge set of frequent itemsets and association rules. Secondly, it is vulnerable to noise. There are two approaches which have been proposed to address these problems individually. The first problem is addressed by the approach Frequent Closed Itemsets(FCI), FCI removes all the redundant information from the result and makes sure there is no information loss. The second problem is addressed by the approach Approximate Frequent Itemsets(AFI), AFI could identify and fix the noises in the datasets. Each of these two concepts has its own limitations, however, the authors find that if FCI and AFI are put together, they could help each other to overcome the limitations and amplify the advantages. The new integrated approach is termed Noise-tolerant Frequent Closed Itemset(NFCI). The results of the experiments demonstrate the advantages of the new approach: (1) It is noise tolerant. (2) The number of itemsets generated would be dramatically reduced with almost no information loss except for the noise and the infrequent patterns. (3) Hence, it is both time and space efficient. (4) No redundant information is in the result.

  • Weighted Association Rule Mining for Item Groups with Different Properties and Risk Assessment for Networked Systems

    Jungja KIM  Heetaek CEONG  Yonggwan WON  

     
    PAPER-Data Mining

      Vol:
    E92-D No:1
      Page(s):
    10-15

    In market-basket analysis, weighted association rule (WAR) discovery can mine the rules that include more beneficial information by reflecting item importance for special products. In the point-of-sale database, each transaction is composed of items with similar properties, and item weights are pre-defined and fixed by a factor such as the profit. However, when items are divided into more than one group and the item importance must be measured independently for each group, traditional weighted association rule discovery cannot be used. To solve this problem, we propose a new weighted association rule mining methodology. The items should be first divided into subgroups according to their properties, and the item importance, i.e. item weight, is defined or calculated only with the items included in the subgroup. Then, transaction weight is measured by appropriately summing the item weights from each subgroup, and the weighted support is computed as the fraction of the transaction weights that contains the candidate items relative to the weight of all transactions. As an example, our proposed methodology is applied to assess the vulnerability to threats of computer systems that provide networked services. Our algorithm provides both quantitative risk-level values and qualitative risk rules for the security assessment of networked computer systems using WAR discovery. Also, it can be widely used for new applications with many data sets in which the data items are distinctly separated.

  • Finding Frequent Closed Itemsets in Sliding Window in Linear Time

    Junbo CHEN  Bo ZHOU  Lu CHEN  Xinyu WANG  Yiqun DING  

     
    PAPER-Data Mining

      Vol:
    E91-D No:10
      Page(s):
    2406-2418

    One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+ [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [5], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.

  • A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns

    Kazuya HARAGUCHI  Mutsunori YAGIURA  Endre BOROS  Toshihide IBARAKI  

     
    PAPER-Algorithm Theory

      Vol:
    E91-D No:3
      Page(s):
    781-788

    We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.

  • Efficient Substructure Discovery from Large Semi-Structured Data

    Tatsuya ASAI  Kenji ABE  Shinji KAWASOE  Hiroshi SAKAMOTO  Hiroki ARIMURA  Setsuo ARIKAWA  

     
    PAPER-Data Mining

      Vol:
    E87-D No:12
      Page(s):
    2754-2763

    In this paper, we consider a data mining problem for semi-structured data. Modeling semi-structured data as labeled ordered trees, we present an efficient algorithm for discovering frequent substructures from a large collection of semi-structured data. By extending the enumeration technique developed by Bayardo (SIGMOD'98) for discovering long itemsets, our algorithm scales almost linearly in the total size of maximal tree patterns contained in an input collection depending mildly on the size of the longest pattern. We also developed several pruning techniques that significantly speed-up the search. Experiments on Web data show that our algorithm runs efficiently on real-life datasets combined with proposed pruning techniques in the wide range of parameters.

  • DODDLE II: A Domain Ontology Development Environment Using a MRD and Text Corpus

    Masaki KUREMATSU  Takamasa IWADE  Naomi NAKAYA  Takahira YAMAGUCHI  

     
    PAPER-Knowledge Engineering and Robotics

      Vol:
    E87-D No:4
      Page(s):
    908-916

    In this paper, we describe how to exploit a machine-readable dictionary (MRD) and domain-specific text corpus in supporting the construction of domain ontologies that specify taxonomic and non-taxonomic relationships among given domain concepts. In building taxonomic relationships (hierarchical structure) of domain concepts, some hierarchical structure can be extracted from a MRD with marked subtrees that may be modified by a domain expert, using matching result analysis and trimmed result analysis. In building non-taxonomic relationships (specification templates) of domain concepts, we construct concept specification templates that come from pairs of concepts extracted from text corpus, using WordSpace and an association rule algorithm. A domain expert modifies taxonomic and non-taxonomic relationships later. Through case studies with "the Contracts for the International Sales of Goods (CISG)" and "XML Common Business Library (xCBL)", we make sure that our system can work to support the process of constructing domain ontologies with a MRD and text corpus.

  • Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules

    Kritsada SRIPHAEW  Thanaruk THEERAMUNKONG  

     
    PAPER-Databases

      Vol:
    E87-D No:3
      Page(s):
    761-770

    Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a method to mine a smaller set of generalized closed frequent itemsets instead of mining a large set of conventional generalized frequent itemsets. To this end, we develop two algorithms called SET and cSET for mining generalized frequent itemsets and generalized closed frequent itemsets, respectively. By a number of experiments, the proposed algorithms outperform the previous well-known algorithms in both computational time and memory utilization. Furthermore, the experiments with real datasets indicate that mining generalized closed frequent itemsets gains more merit on computational costs since the number of generalized closed frequent itemsets is much more smaller than the number of generalized frequent itemsets.

  • Computational Complexity of Finding Meaningful Association Rules

    Yeon-Dae KWON  Ryuichi NAKANISHI  Minoru ITO  Michio NAKANISHI  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E82-A No:9
      Page(s):
    1945-1952

    Recent developments in computer technology allow us to analyze all the data in a huge database. Data mining is to analyze all the data in such a database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of transactions. One way to find meaningful association rules is to find all the large itemsets first, and then to find meaningful association rules from the large itemsets. Although a number of algorithms for computing all the large itemsets have been proposed, the computational complexity of them is scarcely disscussed. In this paper, we show that it is NP-complete to decide whether there exists a large itemset that has a given cardinality. Also, we propose subclasses of databases in which all the meaningful association rules can be computed in time polynomial of the size of a database.

  • Association Rule Filter for Data Mining in Call Tracking Data

    Kazunori MATSUMOTO  Kazuo HASHIMOTO  

     
    PAPER-Network Design, Operation, and Management

      Vol:
    E81-B No:12
      Page(s):
    2481-2486

    Call tracking data contains a calling address, called address, service type, and other useful attributes to predict a customer's calling activity. Call tracking data is becoming a target of data mining for telecommunication carriers. Conventional data-mining programs control the number of association rules found with two types of thresholds (minimum confidence and minimum support), however, often they generate too many association rules because of the wide variety of patterns found in call tracking data. This paper proposes a new method to reduce the number of generated rules. The method proposed tests each generated rule based on Akaike Information Criteria (AIC) without using conventional thresholds. Experiments with artificial call tracking data show the high performance of the proposed method.