A Hybrid Feature Selection Method for Software Fault Prediction

Yiheng JIAN; Xiao YU; Zhou XU; Ziyi MA

doi:10.1587/transinf.2019EDP7033

IEICE TRANSACTIONS on Information

A Hybrid Feature Selection Method for Software Fault Prediction

Yiheng JIAN, Xiao YU, Zhou XU, Ziyi MA

Full Text Views

0

Cite this

Summary :

Fault prediction aims to identify whether a software module is defect-prone or not according to metrics that are mined from software projects. These metric values, also known as features, may involve irrelevance and redundancy, which hurt the performance of fault prediction models. In order to filter out irrelevant and redundant features, a Hybrid Feature Selection (abbreviated as HFS) method for software fault prediction is proposed. The proposed HFS method consists of two major stages. First, HFS groups features with hierarchical agglomerative clustering; second, HFS selects the most valuable features from each cluster to remove irrelevant and redundant ones based on two wrapper based strategies. The empirical evaluation was conducted on 11 widely-studied NASA projects, using three different classifiers with four performance metrics (precision, recall, F-measure, and AUC). Comparison with six filter-based feature selection methods demonstrates that HFS achieves higher average F-measure and AUC values. Compared with two classic wrapper feature selection methods, HFS can obtain a competitive prediction performance in terms of average AUC while significantly reducing the computation cost of the wrapper process.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.10 pp.1966-1975

Publication Date: 2019/10/01

Publicized: 2019/07/09

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7033

Type of Manuscript: PAPER

Category: Software Engineering

Authors

Yiheng JIAN
  Beijing Institute of Technology
Xiao YU
  Wuhan University
Zhou XU
  Wuhan University
Ziyi MA
  Huazhong University of Science and Technology

Keyword

fault prediction, feature selection, hierarchical agglomerative clustering

Cite this

Copy

Yiheng JIAN, Xiao YU, Zhou XU, Ziyi MA, "A Hybrid Feature Selection Method for Software Fault Prediction" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 10, pp. 1966-1975, October 2019, doi: 10.1587/transinf.2019EDP7033.
Abstract: Fault prediction aims to identify whether a software module is defect-prone or not according to metrics that are mined from software projects. These metric values, also known as features, may involve irrelevance and redundancy, which hurt the performance of fault prediction models. In order to filter out irrelevant and redundant features, a Hybrid Feature Selection (abbreviated as HFS) method for software fault prediction is proposed. The proposed HFS method consists of two major stages. First, HFS groups features with hierarchical agglomerative clustering; second, HFS selects the most valuable features from each cluster to remove irrelevant and redundant ones based on two wrapper based strategies. The empirical evaluation was conducted on 11 widely-studied NASA projects, using three different classifiers with four performance metrics (precision, recall, F-measure, and AUC). Comparison with six filter-based feature selection methods demonstrates that HFS achieves higher average F-measure and AUC values. Compared with two classic wrapper feature selection methods, HFS can obtain a competitive prediction performance in terms of average AUC while significantly reducing the computation cost of the wrapper process.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7033/_p

Copy

@ARTICLE{e102-d_10_1966,
author={Yiheng JIAN, Xiao YU, Zhou XU, Ziyi MA, },
journal={IEICE TRANSACTIONS on Information},
title={A Hybrid Feature Selection Method for Software Fault Prediction},
year={2019},
volume={E102-D},
number={10},
pages={1966-1975},
abstract={Fault prediction aims to identify whether a software module is defect-prone or not according to metrics that are mined from software projects. These metric values, also known as features, may involve irrelevance and redundancy, which hurt the performance of fault prediction models. In order to filter out irrelevant and redundant features, a Hybrid Feature Selection (abbreviated as HFS) method for software fault prediction is proposed. The proposed HFS method consists of two major stages. First, HFS groups features with hierarchical agglomerative clustering; second, HFS selects the most valuable features from each cluster to remove irrelevant and redundant ones based on two wrapper based strategies. The empirical evaluation was conducted on 11 widely-studied NASA projects, using three different classifiers with four performance metrics (precision, recall, F-measure, and AUC). Comparison with six filter-based feature selection methods demonstrates that HFS achieves higher average F-measure and AUC values. Compared with two classic wrapper feature selection methods, HFS can obtain a competitive prediction performance in terms of average AUC while significantly reducing the computation cost of the wrapper process.},
keywords={},
doi={10.1587/transinf.2019EDP7033},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - A Hybrid Feature Selection Method for Software Fault Prediction
T2 - IEICE TRANSACTIONS on Information
SP - 1966
EP - 1975
AU - Yiheng JIAN
AU - Xiao YU
AU - Zhou XU
AU - Ziyi MA
PY - 2019
DO - 10.1587/transinf.2019EDP7033
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2019
AB - Fault prediction aims to identify whether a software module is defect-prone or not according to metrics that are mined from software projects. These metric values, also known as features, may involve irrelevance and redundancy, which hurt the performance of fault prediction models. In order to filter out irrelevant and redundant features, a Hybrid Feature Selection (abbreviated as HFS) method for software fault prediction is proposed. The proposed HFS method consists of two major stages. First, HFS groups features with hierarchical agglomerative clustering; second, HFS selects the most valuable features from each cluster to remove irrelevant and redundant ones based on two wrapper based strategies. The empirical evaluation was conducted on 11 widely-studied NASA projects, using three different classifiers with four performance metrics (precision, recall, F-measure, and AUC). Comparison with six filter-based feature selection methods demonstrates that HFS achieves higher average F-measure and AUC values. Compared with two classic wrapper feature selection methods, HFS can obtain a competitive prediction performance in terms of average AUC while significantly reducing the computation cost of the wrapper process.
ER -

IEICE TRANSACTIONS on Information