The search functionality is under construction.

The search functionality is under construction.

Decision trees are used as a convenient means to explain given positive examples and negative examples, which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data, this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive. ) A motivation of this study comes from an observation that real world data are often positive, and in such cases it is natural to build decision trees which represent positive (i. e. , monotone) discriminant functions. For this, we propose how to modify the existing procedures such as ID3, so that the resulting decision tree represents a positive discriminant function. In this procedure, we add some new data to recover the positivity of data, which the original data had but was lost in the process of decomposing data sets by such methods as ID3. To compare the performance of our method with existing methods, we test (1) positive data, which are randomly generated from a hidden positive Boolean function after adding dummy attributes, and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that, although the sizes of positive decision trees are relatively larger than those without positivity assumption, positive decision trees exhibit higher accuracy and tend to choose correct attributes, on which the hidden positive Boolean function is defined. For the breast cancer data set, we also observe a similar tendency; i. e. , positive decision trees are larger but give higher accuracy.

- Publication
- IEICE TRANSACTIONS on Information Vol.E82-D No.1 pp.76-88

- Publication Date
- 1999/01/25

- Publicized

- Online ISSN

- DOI

- Type of Manuscript
- Special Section PAPER (Special Issue on New Generation Database Technologies)

- Category
- Theoretical Aspects

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Kazuhisa MAKINO, Takashi SUDA, Hirotaka ONO, Toshihide IBARAKI, "Data Analysis by Positive Decision Trees" in IEICE TRANSACTIONS on Information,
vol. E82-D, no. 1, pp. 76-88, January 1999, doi: .

Abstract: Decision trees are used as a convenient means to explain given positive examples and negative examples, which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data, this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive. ) A motivation of this study comes from an observation that real world data are often positive, and in such cases it is natural to build decision trees which represent positive (i. e. , monotone) discriminant functions. For this, we propose how to modify the existing procedures such as ID3, so that the resulting decision tree represents a positive discriminant function. In this procedure, we add some new data to recover the positivity of data, which the original data had but was lost in the process of decomposing data sets by such methods as ID3. To compare the performance of our method with existing methods, we test (1) positive data, which are randomly generated from a hidden positive Boolean function after adding dummy attributes, and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that, although the sizes of positive decision trees are relatively larger than those without positivity assumption, positive decision trees exhibit higher accuracy and tend to choose correct attributes, on which the hidden positive Boolean function is defined. For the breast cancer data set, we also observe a similar tendency; i. e. , positive decision trees are larger but give higher accuracy.

URL: https://global.ieice.org/en_transactions/information/10.1587/e82-d_1_76/_p

Copy

@ARTICLE{e82-d_1_76,

author={Kazuhisa MAKINO, Takashi SUDA, Hirotaka ONO, Toshihide IBARAKI, },

journal={IEICE TRANSACTIONS on Information},

title={Data Analysis by Positive Decision Trees},

year={1999},

volume={E82-D},

number={1},

pages={76-88},

abstract={Decision trees are used as a convenient means to explain given positive examples and negative examples, which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data, this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive. ) A motivation of this study comes from an observation that real world data are often positive, and in such cases it is natural to build decision trees which represent positive (i. e. , monotone) discriminant functions. For this, we propose how to modify the existing procedures such as ID3, so that the resulting decision tree represents a positive discriminant function. In this procedure, we add some new data to recover the positivity of data, which the original data had but was lost in the process of decomposing data sets by such methods as ID3. To compare the performance of our method with existing methods, we test (1) positive data, which are randomly generated from a hidden positive Boolean function after adding dummy attributes, and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that, although the sizes of positive decision trees are relatively larger than those without positivity assumption, positive decision trees exhibit higher accuracy and tend to choose correct attributes, on which the hidden positive Boolean function is defined. For the breast cancer data set, we also observe a similar tendency; i. e. , positive decision trees are larger but give higher accuracy.},

keywords={},

doi={},

ISSN={},

month={January},}

Copy

TY - JOUR

TI - Data Analysis by Positive Decision Trees

T2 - IEICE TRANSACTIONS on Information

SP - 76

EP - 88

AU - Kazuhisa MAKINO

AU - Takashi SUDA

AU - Hirotaka ONO

AU - Toshihide IBARAKI

PY - 1999

DO -

JO - IEICE TRANSACTIONS on Information

SN -

VL - E82-D

IS - 1

JA - IEICE TRANSACTIONS on Information

Y1 - January 1999

AB - Decision trees are used as a convenient means to explain given positive examples and negative examples, which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data, this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive. ) A motivation of this study comes from an observation that real world data are often positive, and in such cases it is natural to build decision trees which represent positive (i. e. , monotone) discriminant functions. For this, we propose how to modify the existing procedures such as ID3, so that the resulting decision tree represents a positive discriminant function. In this procedure, we add some new data to recover the positivity of data, which the original data had but was lost in the process of decomposing data sets by such methods as ID3. To compare the performance of our method with existing methods, we test (1) positive data, which are randomly generated from a hidden positive Boolean function after adding dummy attributes, and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that, although the sizes of positive decision trees are relatively larger than those without positivity assumption, positive decision trees exhibit higher accuracy and tend to choose correct attributes, on which the hidden positive Boolean function is defined. For the breast cancer data set, we also observe a similar tendency; i. e. , positive decision trees are larger but give higher accuracy.

ER -