Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction

Haijin JI; Song HUANG; Xuewei LV; Yaning WU; Yuntian FENG

doi:10.1587/transinf.2018EDP7177

IEICE TRANSACTIONS on Information

Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction

Haijin JI, Song HUANG, Xuewei LV, Yaning WU, Yuntian FENG

Full Text Views

0

Cite this

Summary :

Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.1 pp.75-84

Publication Date: 2019/01/01

Publicized: 2018/10/03

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDP7177

Type of Manuscript: PAPER

Category: Software Engineering

Authors

Haijin JI
  Huaiyin Normal University,Army Engineering University of PLA
Song HUANG
  Army Engineering University of PLA
Xuewei LV
  Huaiyin Normal University,Army Engineering University of PLA
Yaning WU
  Army Engineering University of PLA
Yuntian FENG
  Army Engineering University of PLA

Keyword

software defect prediction, naive Bayes, kernel density estimation, software metrics

Cite this

Copy

Haijin JI, Song HUANG, Xuewei LV, Yaning WU, Yuntian FENG, "Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 1, pp. 75-84, January 2019, doi: 10.1587/transinf.2018EDP7177.
Abstract: Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7177/_p

Copy

@ARTICLE{e102-d_1_75,
author={Haijin JI, Song HUANG, Xuewei LV, Yaning WU, Yuntian FENG, },
journal={IEICE TRANSACTIONS on Information},
title={Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction},
year={2019},
volume={E102-D},
number={1},
pages={75-84},
abstract={Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.},
keywords={},
doi={10.1587/transinf.2018EDP7177},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction
T2 - IEICE TRANSACTIONS on Information
SP - 75
EP - 84
AU - Haijin JI
AU - Song HUANG
AU - Xuewei LV
AU - Yaning WU
AU - Yuntian FENG
PY - 2019
DO - 10.1587/transinf.2018EDP7177
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2019
AB - Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.
ER -

IEICE TRANSACTIONS on Information