The search functionality is under construction.
The search functionality is under construction.

A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method

Kosuke OHARA, Hirohisa AMAN, Sousuke AMASAKI, Tomoyuki YOKOGAWA, Minoru KAWAHARA

  • Full Text Views

    0

  • Cite this

Summary :

This paper focuses on the “data collection period” for training a better Just-In-Time (JIT) defect prediction model — the early commit data vs. the recent one —, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.2 pp.166-169
Publication Date
2023/02/01
Publicized
2022/11/11
Online ISSN
1745-1361
DOI
10.1587/transinf.2022MPL0002
Type of Manuscript
Special Section LETTER (Special Section on Empirical Software Engineering)
Category

Authors

Kosuke OHARA
  Ehime University
Hirohisa AMAN
  Ehime University
Sousuke AMASAKI
  Okayama Prefectural University
Tomoyuki YOKOGAWA
  Okayama Prefectural University
Minoru KAWAHARA
  Ehime University

Keyword