IEICE global.ieice.org Site

Keyword Search Result

[Keyword] mining software repositories(5hit)

1-5hit

Commit-Based Class-Level Defect Prediction for Python Projects
Khine Yin MON Masanari KONDO Eunjong CHOI Osamu MIZUNO

PAPER

Pubricized:
2022/11/14
Vol:
E106-D No:2
Page(s):
157-165
Defect prediction approaches have been greatly contributing to software quality assurance activities such as code review or unit testing. Just-in-time defect prediction approaches are developed to predict whether a commit is a defect-inducing commit or not. Prior research has shown that commit-level prediction is not enough in terms of effort, and a defective commit may contain both defective and non-defective files. As the defect prediction community is promoting fine-grained granularity prediction approaches, we propose our novel class-level prediction, which is finer-grained than the file-level prediction, based on the files of the commits in this research. We designed our model for Python projects and tested it with ten open-source Python projects. We performed our experiment with two settings: setting with product metrics only and setting with product metrics plus commit information. Our investigation was conducted with three different classifiers and two validation strategies. We found that our model developed by random forest classifier performs the best, and commit information contributes significantly to the product metrics in 10-fold cross-validation. We also created a commit-based file-level prediction for the Python files which do not have the classes. The file-level model also showed a similar condition as the class-level model. However, the results showed a massive deviation in time-series validation for both levels and the challenge of predicting Python classes and files in a realistic scenario.
An Exploratory Study of Copyright Inconsistency in the Linux Kernel
Shi QIU Daniel M. GERMAN Katsuro INOUE

PAPER-Software Engineering

Pubricized:
2020/11/17
Vol:
E104-D No:2
Page(s):
254-263
Software copyright claims an exclusive right for the software copyright owner to determine whether and under what conditions others can modify, reuse, or redistribute this software. For Free and Open Source Software (FOSS), it is very important to identify the copyright owner who can control those activities with license compliance. Copyright notice is a few sentences mostly placed in the header part of a source file as a comment or in a license document in a FOSS project, and it is an important clue to establish the ownership of a FOSS project. Repositories of FOSS projects contain rich and varied information on the development including the source code contributors who are also an important clue to establish the ownership. In this paper, as a first step of understanding copyright owner, we will explore the situation of the software copyright in the Linux kernel, a typical example of FOSS, by analyzing and comparing two kinds of datasets, copyright notices in source files and source code contributors in the software repositories. The discrepancy between two kinds of analysis results is defined as copyright inconsistency. The analysis result has indicated that copyright inconsistencies are prevalent in the Linux kernel. We have also found that code reuse, affiliation change, refactoring, support function, and others' contributions potentially have impacts on the occurrence of the copyright inconsistencies in the Linux kernel. This study exposes the difficulty in managing software copyright in FOSS, highlighting the usefulness of future work to address software copyright problems.
Understanding Developer Commenting in Code Reviews
Toshiki HIRAO Raula GAIKOVINA KULA Akinori IHARA Kenichi MATSUMOTO

PAPER

Pubricized:
2019/09/11
Vol:
E102-D No:12
Page(s):
2423-2432
Modern code review is a well-known practice to assess the quality of software where developers discuss the quality in a web-based review tool. However, this lightweight approach may risk an inefficient review participation, especially when comments becomes either excessive (i.e., too many) or underwhelming (i.e., too few). In this study, we investigate the phenomena of reviewer commenting. Through a large-scale empirical analysis of over 1.1 million reviews from five OSS systems, we conduct an exploratory study to investigate the frequency, size, and evolution of reviewer commenting. Moreover, we also conduct a modeling study to understand the most important features that potentially drive reviewer comments. Our results find that (i) the number of comments and the number of words in the comments tend to vary among reviews and across studied systems; (ii) reviewers change their behaviours in commenting over time; and (iii) human experience and patch property aspects impact the number of comments and the number of words in the comments.
Software Engineering Data Analytics: A Framework Based on a Multi-Layered Abstraction Mechanism
Chaman WIJESIRIWARDANA Prasad WIMALARATNE

LETTER-Software Engineering

Pubricized:
2018/12/04
Vol:
E102-D No:3
Page(s):
637-639
This paper presents a concept of a domain-specific framework for software analytics by enabling querying, modeling, and integration of heterogeneous software repositories. The framework adheres to a multi-layered abstraction mechanism that consists of domain-specific operators. We showcased the potential of this approach by employing a case study.
Fostering Real-Time Software Analysis by Leveraging Heterogeneous and Autonomous Software Repositories
Chaman WIJESIRIWARDANA Prasad WIMALARATNE

PAPER-Software Engineering

Pubricized:
2018/08/06
Vol:
E101-D No:11
Page(s):
2730-2743
Mining software repositories allow software practitioners to improve the quality of software systems and to support maintenance based on historical data. Such data is scattered across autonomous and heterogeneous information sources, such as version control, bug tracking and build automation systems. Despite having many tools to track and measure the data originated from such repositories, software practitioners often suffer from a scarcity of the techniques necessary to dynamically leverage software repositories to fulfill their complex information needs. For example, answering a question such as “What is the number of commits between two successful builds?” requires tiresome manual inspection of multiple repositories. As a solution, this paper presents a conceptual framework and a proof of concept visual query interface to satisfy distinct software quality related information needs of software practitioners. The data originated from repositories is integrated and analyzed to perform systematic investigations, which helps to uncover hidden relationships between software quality and trends of software evolution. This approach has several significant benefits such as the ability to perform real-time analyses, the ability to combine data from various software repositories and generate queries dynamically. The framework evaluated with 31 subjects by using a series of questions categorized into three software evolution scenarios. The evaluation results evidently show that our framework surpasses the state of the art tools in terms of correctness, time and usability.