1-9hit |
Shi QIU Daniel M. GERMAN Katsuro INOUE
Software copyright claims an exclusive right for the software copyright owner to determine whether and under what conditions others can modify, reuse, or redistribute this software. For Free and Open Source Software (FOSS), it is very important to identify the copyright owner who can control those activities with license compliance. Copyright notice is a few sentences mostly placed in the header part of a source file as a comment or in a license document in a FOSS project, and it is an important clue to establish the ownership of a FOSS project. Repositories of FOSS projects contain rich and varied information on the development including the source code contributors who are also an important clue to establish the ownership. In this paper, as a first step of understanding copyright owner, we will explore the situation of the software copyright in the Linux kernel, a typical example of FOSS, by analyzing and comparing two kinds of datasets, copyright notices in source files and source code contributors in the software repositories. The discrepancy between two kinds of analysis results is defined as copyright inconsistency. The analysis result has indicated that copyright inconsistencies are prevalent in the Linux kernel. We have also found that code reuse, affiliation change, refactoring, support function, and others' contributions potentially have impacts on the occurrence of the copyright inconsistencies in the Linux kernel. This study exposes the difficulty in managing software copyright in FOSS, highlighting the usefulness of future work to address software copyright problems.
Shi QIU German M. DANIEL Katsuro INOUE
For Free and Open Source Software (FOSS), identifying the copyright notices is important. However, both the collaborative manner of FOSS project development and the large number of source files increase its difficulty. In this paper, we aim at automatically identifying the copyright notices in source files based on machine learning techniques. The evaluation experiment shows that our method outperforms FOSSology, the only existing method based on regular expression.
Yasutaka KAMEI Takahiro MATSUMOTO Kazuhiro YAMASHITA Naoyasu UBAYASHI Takashi IWASAKI Shuichi TAKAYAMA
Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects.
Xin YANG Norihiro YOSHIDA Raula GAIKOVINA KULA Hajimu IIDA
Software peer review is regarded as one of the most important approaches to preserving software quality. Due to the distributed collaborations in Open Source Software (OSS) development, the review techniques and processes conducted in OSS environment differ from the traditional review method that based on formal face-to-face meetings. Unlike other related works, this study investigates peer review processes of OSS projects from the social perspective: communication and interaction in peer review by using social network analysis (SNA). Moreover, the relationship between peer review contributors and their activities is studied. We propose an approach to evaluating contributors' activeness and social relationship using SNA named Peer Review Social Network (PeRSoN). We evaluate our approach by empirical case study, 326,286 review comments and 1,745 contributors from three representative industrial OSS projects have been extracted and analyzed. The results indicate that the social network structure influences the realistic activeness of contributors significantly. Based on the results, we suggest our approach can support project leaders in assigning review tasks, appointing reviewers and other activities to improve current software processes.
Yoji YAMATO Shinichiro KATSURAGI Shinji NAGAO Norihiro MIURA
We evaluated software maintenance of an open source cloud platform system we developed using an agile software development method. We previously reported on a rapid service launch using the agile software development method in spite of large-scale development. For this study, we analyzed inquiries and the defect removal efficiency of our recently developed software throughout one-year operation. We found that the defect removal efficiency of our recently developed software was 98%. This indicates that we could achieve sufficient quality in spite of large-scale agile development. In term of maintenance process, we could answer all enquiries within three business days and could conduct version-upgrade fast. Thus, we conclude that software maintenance of agile software development is not ineffective.
Tao WANG Huaimin WANG Gang YIN Cheng YANG Xiang LI Peng ZOU
The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, but were verified on only relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software development requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles. We design a SVM-based categorization framework and adopt a weighted combination strategy to aggregate different types of profile attributes from multiple repositories. Different basic classification algorithms and feature selection techniques are employed and compared. Extensive experiments are carried out on more than 21,000 projects across five repositories. The results show that our approach achieves significant improvements by using weighted combination. Compared to the previous work, our approach presents competitive results with more finer-grained and multi-layered category hierarchy with more than 120 categories. Unlike approaches that use source code or byte code, our approach is more effective for large-scale and language-independent software categorization. In addition, experiments suggest that hierarchical categorization combined with general keyword-based searching improves the retrieval efficiency and accuracy.
Eunjong CHOI Norihiro YOSHIDA Katsuro INOUE
Although code clones (i.e. code fragments that have similar or identical code fragments in the source code) are regarded as a factor that increases the complexity of software maintenance, tools for supporting clone refactoring (i.e. merging a set of code clones into a single method or function) are not commonly used. To promote the development of refactoring tools that can be more widely utilized, we present an investigation of clone refactoring carried out in the development of open source software systems. In the investigation, we identified the most frequently used refactoring patterns and discovered how merged code clone token sequences and differences in token sequence lengths vary for each refactoring pattern.
Passakorn PHANNACHITTA Akinori IHARA Pijak JIRAPIWONG Masao OHIRA Ken-ichi MATSUMOTO
Nowadays, software development societies have given more precedence to Open Source Software (OSS). There is much research aimed at understanding the OSS society to sustain the OSS product. To lead an OSS project to a successful conclusion, researchers study how developers change source codes called patches in project repositories. In existing studies, we found an argument in the conventional patch acceptance detection procedure. It was so simplified that it omitted important cases from the analysis, and would lead researchers to wrong conclusions. In this research, we propose an algorithm to overcome the problem. To prove out our algorithm, we constructed a framework and conducted two case studies. As a result, we came to a new and interesting understanding of patch activities.
Anakorn JONGYINDEE Masao OHIRA Akinori IHARA Ken-ichi MATSUMOTO
There are many roles to play in the bug fixing process in open source software development. A developer called “Committer”, who has a permission to submit a patch into a software repository, plays a major role in this process and holds a key to the successfulness of the project. Despite the importance of committer's activities, we suspect that sometimes committers can make mistakes which have some consequences to the bug fixing process (e.g., reopened bugs after bug fixing). Our research focuses on studying the consequences of each committer's activities to this process. We collected each committer's historical data from the Eclipse-Platform's bug tracking system and version control system and evaluated their activities using bug status in the bug tracking system and commit log in the version control system. Then we looked deeper into each committer's characteristics to see the reasons why some committers tend to make mistakes more than the others.