The search functionality is under construction.

The search functionality is under construction.

Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called *k*-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, *k*-sparsity of a database means that the supports of itemsets of size *k* or more are sufficiently low in the database. In this paper, we introduce the notion of (*k*,*c*)-sparsity, which is strictly weaker than the *k*-sparsity in our previous work. The value of *c* represents a degree of sparsity. Using (*k*,*c*)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (*k*,*c*)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.

- Publication
- IEICE TRANSACTIONS on Fundamentals Vol.E83-A No.12 pp.2723-2735

- Publication Date
- 2000/12/25

- Publicized

- Online ISSN

- DOI

- Type of Manuscript
- PAPER

- Category
- General Fundamentals and Boundaries

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Yeon-Dae KWON, Yasunori ISHIHARA, Shougo SHIMIZU, Minoru ITO, "Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases" in IEICE TRANSACTIONS on Fundamentals,
vol. E83-A, no. 12, pp. 2723-2735, December 2000, doi: .

Abstract: Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called *k*-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, *k*-sparsity of a database means that the supports of itemsets of size *k* or more are sufficiently low in the database. In this paper, we introduce the notion of (*k*,*c*)-sparsity, which is strictly weaker than the *k*-sparsity in our previous work. The value of *c* represents a degree of sparsity. Using (*k*,*c*)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (*k*,*c*)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.

URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e83-a_12_2723/_p

Copy

@ARTICLE{e83-a_12_2723,

author={Yeon-Dae KWON, Yasunori ISHIHARA, Shougo SHIMIZU, Minoru ITO, },

journal={IEICE TRANSACTIONS on Fundamentals},

title={Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases},

year={2000},

volume={E83-A},

number={12},

pages={2723-2735},

abstract={Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called *k*-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, *k*-sparsity of a database means that the supports of itemsets of size *k* or more are sufficiently low in the database. In this paper, we introduce the notion of (*k*,*c*)-sparsity, which is strictly weaker than the *k*-sparsity in our previous work. The value of *c* represents a degree of sparsity. Using (*k*,*c*)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (*k*,*c*)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.},

keywords={},

doi={},

ISSN={},

month={December},}

Copy

TY - JOUR

TI - Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases

T2 - IEICE TRANSACTIONS on Fundamentals

SP - 2723

EP - 2735

AU - Yeon-Dae KWON

AU - Yasunori ISHIHARA

AU - Shougo SHIMIZU

AU - Minoru ITO

PY - 2000

DO -

JO - IEICE TRANSACTIONS on Fundamentals

SN -

VL - E83-A

IS - 12

JA - IEICE TRANSACTIONS on Fundamentals

Y1 - December 2000

AB - Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called *k*-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, *k*-sparsity of a database means that the supports of itemsets of size *k* or more are sufficiently low in the database. In this paper, we introduce the notion of (*k*,*c*)-sparsity, which is strictly weaker than the *k*-sparsity in our previous work. The value of *c* represents a degree of sparsity. Using (*k*,*c*)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (*k*,*c*)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.

ER -