Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.
Fei WU
Nanjing University of Posts and Telecommunications
Xinhao ZHENG
Nanjing University of Posts and Telecommunications
Ying SUN
Nanjing University of Posts and Telecommunications
Yang GAO
Nanjing University of Posts and Telecommunications
Xiao-Yuan JING
Wuhan University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Fei WU, Xinhao ZHENG, Ying SUN, Yang GAO, Xiao-Yuan JING, "Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 2, pp. 432-435, February 2022, doi: 10.1587/transinf.2021EDL8061.
Abstract: Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8061/_p
Copy
@ARTICLE{e105-d_2_432,
author={Fei WU, Xinhao ZHENG, Ying SUN, Yang GAO, Xiao-Yuan JING, },
journal={IEICE TRANSACTIONS on Information},
title={Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction},
year={2022},
volume={E105-D},
number={2},
pages={432-435},
abstract={Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.},
keywords={},
doi={10.1587/transinf.2021EDL8061},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction
T2 - IEICE TRANSACTIONS on Information
SP - 432
EP - 435
AU - Fei WU
AU - Xinhao ZHENG
AU - Ying SUN
AU - Yang GAO
AU - Xiao-Yuan JING
PY - 2022
DO - 10.1587/transinf.2021EDL8061
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2022
AB - Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.
ER -