Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Sombut FOITONG, Ouen PINNGERN, Boonwat ATTACHOO, "Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty" in IEICE TRANSACTIONS on Information,
vol. E95-D, no. 4, pp. 970-981, April 2012, doi: 10.1587/transinf.E95.D.970.
Abstract: Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.970/_p
Copy
@ARTICLE{e95-d_4_970,
author={Sombut FOITONG, Ouen PINNGERN, Boonwat ATTACHOO, },
journal={IEICE TRANSACTIONS on Information},
title={Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty},
year={2012},
volume={E95-D},
number={4},
pages={970-981},
abstract={Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.},
keywords={},
doi={10.1587/transinf.E95.D.970},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty
T2 - IEICE TRANSACTIONS on Information
SP - 970
EP - 981
AU - Sombut FOITONG
AU - Ouen PINNGERN
AU - Boonwat ATTACHOO
PY - 2012
DO - 10.1587/transinf.E95.D.970
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2012
AB - Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.
ER -