The search functionality is under construction.

Keyword Search Result

[Keyword] software defect prediction(3hit)

1-3hit
  • Cost-Sensitive and Sparse Ladder Network for Software Defect Prediction

    Jing SUN  Yi-mu JI  Shangdong LIU  Fei WU  

     
    LETTER-Software Engineering

      Pubricized:
    2020/01/29
      Vol:
    E103-D No:5
      Page(s):
    1177-1180

    Software defect prediction (SDP) plays a vital role in allocating testing resources reasonably and ensuring software quality. When there are not enough labeled historical modules, considerable semi-supervised SDP methods have been proposed, and these methods utilize limited labeled modules and abundant unlabeled modules simultaneously. Nevertheless, most of them make use of traditional features rather than the powerful deep feature representations. Besides, the cost of the misclassification of the defective modules is higher than that of defect-free ones, and the number of the defective modules for training is small. Taking the above issues into account, we propose a cost-sensitive and sparse ladder network (CSLN) for SDP. We firstly introduce the semi-supervised ladder network to extract the deep feature representations. Besides, we introduce the cost-sensitive learning to set different misclassification costs for defective-prone and defect-free-prone instances to alleviate the class imbalance problem. A sparse constraint is added on the hidden nodes in ladder network when the number of hidden nodes is large, which enables the model to find robust structures of the data. Extensive experiments on the AEEEM dataset show that the CSLN outperforms several state-of-the-art semi-supervised SDP methods.

  • Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction

    Haijin JI  Song HUANG  Xuewei LV  Yaning WU  Yuntian FENG  

     
    PAPER-Software Engineering

      Pubricized:
    2018/10/03
      Vol:
    E102-D No:1
      Page(s):
    75-84

    Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.

  • The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study

    Qiao YU  Shujuan JIANG  Yanmei ZHANG  

     
    PAPER-Software Engineering

      Pubricized:
    2016/11/04
      Vol:
    E100-D No:2
      Page(s):
    265-272

    Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.