The search functionality is under construction.

Keyword Search Result

[Keyword] empirical software engineering(3hit)

1-3hit
  • Commit-Based Class-Level Defect Prediction for Python Projects

    Khine Yin MON  Masanari KONDO  Eunjong CHOI  Osamu MIZUNO  

     
    PAPER

      Pubricized:
    2022/11/14
      Vol:
    E106-D No:2
      Page(s):
    157-165

    Defect prediction approaches have been greatly contributing to software quality assurance activities such as code review or unit testing. Just-in-time defect prediction approaches are developed to predict whether a commit is a defect-inducing commit or not. Prior research has shown that commit-level prediction is not enough in terms of effort, and a defective commit may contain both defective and non-defective files. As the defect prediction community is promoting fine-grained granularity prediction approaches, we propose our novel class-level prediction, which is finer-grained than the file-level prediction, based on the files of the commits in this research. We designed our model for Python projects and tested it with ten open-source Python projects. We performed our experiment with two settings: setting with product metrics only and setting with product metrics plus commit information. Our investigation was conducted with three different classifiers and two validation strategies. We found that our model developed by random forest classifier performs the best, and commit information contributes significantly to the product metrics in 10-fold cross-validation. We also created a commit-based file-level prediction for the Python files which do not have the classes. The file-level model also showed a similar condition as the class-level model. However, the results showed a massive deviation in time-series validation for both levels and the challenge of predicting Python classes and files in a realistic scenario.

  • Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

    Maohua GAN  Zeynep YÜCEL  Akito MONDEN  Kentaro SASAKI  

     
    PAPER-Software Engineering

      Pubricized:
    2020/07/03
      Vol:
    E103-D No:10
      Page(s):
    2094-2103

    To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples.

  • Customizing GQM Models for Software Project Monitoring

    Akito MONDEN  Tomoko MATSUMURA  Mike BARKER  Koji TORII  Victor R. BASILI  

     
    PAPER

      Vol:
    E95-D No:9
      Page(s):
    2169-2182

    This paper customizes Goal/Question/Metric (GQM) project monitoring models for various projects and organizations to take advantage of the data from the software tool EPM and to allow the tailoring of the interpretation models based upon the context and success criteria for each project and organization. The basic idea is to build less concrete models that do not include explicit baseline values to interpret metrics values. Instead, we add hypothesis and interpretation layers to the models to help people of different projects make decisions in their own context. We applied the models to two industrial projects, and found that our less concrete models could successfully identify typical problems in software projects.