Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.
Qiao YU
China University of Mining and Technology
Shujuan JIANG
China University of Mining and Technology
Yanmei ZHANG
China University of Mining and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Qiao YU, Shujuan JIANG, Yanmei ZHANG, "The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study" in IEICE TRANSACTIONS on Information,
vol. E100-D, no. 2, pp. 265-272, February 2017, doi: 10.1587/transinf.2016EDP7204.
Abstract: Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7204/_p
Copy
@ARTICLE{e100-d_2_265,
author={Qiao YU, Shujuan JIANG, Yanmei ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study},
year={2017},
volume={E100-D},
number={2},
pages={265-272},
abstract={Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.},
keywords={},
doi={10.1587/transinf.2016EDP7204},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study
T2 - IEICE TRANSACTIONS on Information
SP - 265
EP - 272
AU - Qiao YU
AU - Shujuan JIANG
AU - Yanmei ZHANG
PY - 2017
DO - 10.1587/transinf.2016EDP7204
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2017
AB - Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.
ER -