This paper focuses on the “data collection period” for training a better Just-In-Time (JIT) defect prediction model — the early commit data vs. the recent one —, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.
Kosuke OHARA
Ehime University
Hirohisa AMAN
Ehime University
Sousuke AMASAKI
Okayama Prefectural University
Tomoyuki YOKOGAWA
Okayama Prefectural University
Minoru KAWAHARA
Ehime University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kosuke OHARA, Hirohisa AMAN, Sousuke AMASAKI, Tomoyuki YOKOGAWA, Minoru KAWAHARA, "A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 2, pp. 166-169, February 2023, doi: 10.1587/transinf.2022MPL0002.
Abstract: This paper focuses on the “data collection period” for training a better Just-In-Time (JIT) defect prediction model — the early commit data vs. the recent one —, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022MPL0002/_p
Copy
@ARTICLE{e106-d_2_166,
author={Kosuke OHARA, Hirohisa AMAN, Sousuke AMASAKI, Tomoyuki YOKOGAWA, Minoru KAWAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method},
year={2023},
volume={E106-D},
number={2},
pages={166-169},
abstract={This paper focuses on the “data collection period” for training a better Just-In-Time (JIT) defect prediction model — the early commit data vs. the recent one —, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.},
keywords={},
doi={10.1587/transinf.2022MPL0002},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method
T2 - IEICE TRANSACTIONS on Information
SP - 166
EP - 169
AU - Kosuke OHARA
AU - Hirohisa AMAN
AU - Sousuke AMASAKI
AU - Tomoyuki YOKOGAWA
AU - Minoru KAWAHARA
PY - 2023
DO - 10.1587/transinf.2022MPL0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2023
AB - This paper focuses on the “data collection period” for training a better Just-In-Time (JIT) defect prediction model — the early commit data vs. the recent one —, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.
ER -