1-15hit |
Saya ONOUE Hideaki HATA Akito MONDEN Kenichi MATSUMOTO
GitHub is a developers' social networking service that hosts a great number of open source software (OSS) projects. Although some of the hosted projects are growing and have many developers, most projects are organized by a few developers and face difficulties in terms of sustainability. OSS projects depend mainly on volunteer developers, and attracting and retaining these volunteers are major concerns of the project stakeholders. To investigate the population structures of OSS development communities in detail and conduct software analytics to obtain actionable information, we apply a demographic approach. Demography is the scientific study of population and seeks to identify the levels and trends in the size and components of a population. This paper presents a case study, investigating the characteristics of the population structures of OSS projects on GitHub, and shows population projections generated with the well-known cohort component method. We found that there are four types of population structures in OSS development communities in terms of experiences and contributions. In addition, we projected the future population accurately using a cohort component population projection method. This method predicts a population of the next period using a survival rate calculated from past population. To the best of our knowledge, this is the first study that applied demography to the field of OSS research. Our approach addressing OSS-related problems based on demography will bring new insights, since studying population is novel in OSS research. Understanding current and future structures of OSS projects can help practitioners to monitor a project, gain awareness of what is happening, manage risks, and evaluate past decisions.
Toshiki HIRAO Raula GAIKOVINA KULA Akinori IHARA Kenichi MATSUMOTO
Modern code review is a well-known practice to assess the quality of software where developers discuss the quality in a web-based review tool. However, this lightweight approach may risk an inefficient review participation, especially when comments becomes either excessive (i.e., too many) or underwhelming (i.e., too few). In this study, we investigate the phenomena of reviewer commenting. Through a large-scale empirical analysis of over 1.1 million reviews from five OSS systems, we conduct an exploratory study to investigate the frequency, size, and evolution of reviewer commenting. Moreover, we also conduct a modeling study to understand the most important features that potentially drive reviewer comments. Our results find that (i) the number of comments and the number of words in the comments tend to vary among reviews and across studied systems; (ii) reviewers change their behaviours in commenting over time; and (iii) human experience and patch property aspects impact the number of comments and the number of words in the comments.
Kohei YOSHIGAMI Taishi HAYASHI Masateru TSUNODA Hidetake UWANO Shunichiro SASAKI Kenichi MATSUMOTO
Recently, many studies have applied gamification to software engineering education and software development to enhance work results. Gamification is defined as “the use of game design elements in non-game contexts.” When applying gamification, we make various game rules, such as a time limit. However, it is not clear whether the rule affects working time or not. For example, if we apply a time limit to impatient developers, the working time may become shorter, but the rule may negatively affect because of pressure for time. In this study, we analyze with subjective experiments whether the rules affects work results such as working time. Our experimental results suggest that for the coding tasks, working time was shortened when we applied a rule that made developers aware of working time by showing elapsed time.
Syful ISLAM Raula GAIKOVINA KULA Christoph TREUDE Bodin CHINTHANET Takashi ISHIO Kenichi MATSUMOTO
The package manager (PM) is crucial to most technology stacks, acting as a broker to ensure that a verified dependency package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of PMs with various features. While our recent study indicates that package management features of PM are related to end-user experiences, it is unclear what those issues are and what information is required to resolve them. In this paper, we have investigated PM issues faced by end-users through an empirical study of content on Stack Overflow (SO). We carried out a qualitative analysis of 1,131 questions and their accepted answer posts for three popular PMs (i.e., Maven, npm, and NuGet) to identify issue types, underlying causes, and their resolutions. Our results confirm that end-users struggle with PM tool usage (approximately 64-72%). We observe that most issues are raised by end-users due to lack of instructions and errors messages from PM tools. In terms of issue resolution, we find that external link sharing is the most common practice to resolve PM issues. Additionally, we observe that links pointing to useful resources (i.e., official documentation websites, tutorials, etc.) are most frequently shared, indicating the potential for tool support and the ability to provide relevant information for PM end-users.
Keitaro NAKASAI Masateru TSUNODA Kenichi MATSUMOTO
Software developers often use a web search engine to improve work efficiency. However, web search strategies (e.g., frequently changing web search keywords) may be different for each developer. In this study, we attempted to define a better web search strategy. Although many previous studies analyzed web search behavior in programming, they did not provide guidelines for web search strategies. To suggest guidelines for web search strategies, we asked 10 subjects four questions about programming which they had to solve, and analyzed their behavior. In the analysis, we focused on the subjects' task time and the web search metrics defined by us. Based on our experiment, to enhance the effectiveness of the search, we suggest (1) that one should not go through the next search result pages, (2) the number of keywords in queries should be suppressed, and (3) previously used keywords must be avoided when creating a new query.
Passakorn PHANNACHITTA Akito MONDEN Jacky KEUNG Kenichi MATSUMOTO
Analogy-based software effort estimation has gained a considerable amount of attention in current research and practice. Its excellent estimation accuracy relies on its solution adaptation stage, where an effort estimate is produced from similar past projects. This study proposes a solution adaptation technique named LSA-X that introduces an approach to exploit the potential of productivity factors, i.e., project variables with a high correlation with software productivity, in the solution adaptation stage. The LSA-X technique tailors the exploitation of the productivity factors with a procedure based on the Linear Size Adaptation (LSA) technique. The results, based on 19 datasets show that in circumstances where a dataset exhibits a high correlation coefficient between productivity and a related factor (r≥0.30), the proposed LSA-X technique statistically outperformed (95% confidence) the other 8 commonly used techniques compared in this study. In other circumstances, our results suggest using any linear adaptation technique based on software size to compensate for the limitations of the LSA-X technique.
Kanyakorn JEWMAIDANG Takashi ISHIO Akinori IHARA Kenichi MATSUMOTO Pattara LEELAPRUTE
This paper proposes a method to extract and visualize a library update history in a project. The method identifies reused library versions by comparing source code in a product with existing versions of the library so that developers can understand when their own copy of a library has been copied, modified, and updated.
Teruki HAYAKAWA Masateru TSUNODA Koji TODA Keitaro NAKASAI Amjed TAHIR Kwabena Ebo BENNIN Akito MONDEN Kenichi MATSUMOTO
Various software fault prediction models have been proposed in the past twenty years. Many studies have compared and evaluated existing prediction approaches in order to identify the most effective ones. However, in most cases, such models and techniques provide varying results, and their outcomes do not result in best possible performance across different datasets. This is mainly due to the diverse nature of software development projects, and therefore, there is a risk that the selected models lead to inconsistent results across multiple datasets. In this work, we propose the use of bandit algorithms in cases where the accuracy of the models are inconsistent across multiple datasets. In the experiment discussed in this work, we used four conventional prediction models, tested on three different dataset, and then selected the best possible model dynamically by applying bandit algorithms. We then compared our results with those obtained using majority voting. As a result, Epsilon-greedy with ϵ=0.3 showed the best or second-best prediction performance compared with using only one prediction model and majority voting. Our results showed that bandit algorithms can provide promising outcomes when used in fault prediction.
Dong WANG Patanamon THONGTANUNAM Raula GAIKOVINA KULA Kenichi MATSUMOTO
Contemporary development projects benefit from code review as it improves the quality of a project. Large ecosystems of inter-dependent projects like OpenStack generate a large number of reviews, which poses new challenges for collaboration (improving patches, fixing defects). Review tools allow developers to link between patches, to indicate patch dependency, competing solutions, or provide broader context. We hypothesize that such patch linkage may also simulate cross-collaboration. With a case study of OpenStack, we take a first step to explore collaborations that occur after a patch linkage was posted between two patches (i.e., cross-patch collaboration). Our empirical results show that although patch linkage that requests collaboration is relatively less prevalent, the probability of collaboration is relatively higher. Interestingly, the results also show that collaborative contributions via patch linkage are non-trivial, i.e, contributions can affect the review outcome (such as voting) or even improve the patch (i.e., revising). This work opens up future directions to understand barriers and opportunities related to this new kind of collaboration, that assists with code review and development tasks in large ecosystems.
Syful ISLAM Dong WANG Raula GAIKOVINA KULA Takashi ISHIO Kenichi MATSUMOTO
Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well known, but it is unclear how information from Stack Overflow (SO) can be useful. This paper performed an empirical study to explore npm package co-usage examples from SO. From over 30,000 SO question posts, we extracted 2,100 posts with package usage information and matched them against the 217,934 npm library package. We find that, popular and highly used libraries are not discussed as often in SO. However, we can see that the accepted answers may prove useful, as we believe that the usage examples and executable commands could be reused for tool support.
Bodin CHINTHANET Raula GAIKOVINA KULA Rodrigo ELIZA ZAPATA Takashi ISHIO Kenichi MATSUMOTO Akinori IHARA
It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype, SōjiTantei, is evaluated in two ways (i) the accuracy when compared to a manual approach and (ii) a larger-scale analysis of 780 clients from 78 security vulnerability cases. The first evaluation shows that SōjiTantei has a high accuracy of 83.3%, with a speed of less than a second analysis per client. The second evaluation reveals that 68 out of the studied 78 vulnerabilities reported having at least one clean client. The study proves that automation is promising with the potential for further improvement.
Yuki UEDA Akinori IHARA Takashi ISHIO Toshiki HIRAO Kenichi MATSUMOTO
Peer code review is key to ensuring the absence of software defects. To reduce review costs, software developers adopt code convention checking tools that automatically identify maintainability issues in source code. However, these tools do not always address the maintainability issue for a particular project. The goal of this study is to understand how code review fixes conditional statement issues, which are the most frequent changes in software development. We conduct an empirical study to understand if-statement changes through code review. Using review requests in the Qt and OpenStack projects, we analyze changes of the if-conditional statements that are (1) requested to be reviewed, and are (2) revised through code review. We find the most frequently changed symbols are “( )”, “.”, and “!”. We also find project-specific fixing patterns for improving code readability by association rule mining. For example “!” operator is frequently replaced with a function call. These rules are useful for improving a coding convention checker tailored for the projects.
Shohei IKEDA Akinori IHARA Raula Gaikovina KULA Kenichi MATSUMOTO
Contemporary software projects often utilize a README.md to share crucial information such as installation and usage examples related to their software. Furthermore, these files serve as an important source of updated and useful documentation for developers and prospective users of the software. Nonetheless, both novice and seasoned developers are sometimes unsure of what is required for a good README file. To understand the contents of README, we investigate the contents of 43,900 JavaScript packages. Results show that these packages contain common content themes (i.e., ‘usage’, ‘install’ and ‘license’). Furthermore, we find that application-specific packages more frequently included content themes such as ‘options’, while library-based packages more frequently included other specific content themes (i.e., ‘install’ and ‘license’).
Masateru TSUNODA Akito MONDEN Kenichi MATSUMOTO Sawako OHIWA Tomoki OSHINO
Software maintenance is an important activity in the software lifecycle. Software maintenance does not only mean removing faults found after software release. Software needs extensions or modifications of its functions owing to changes in the business environment and software maintenance also refers to them. To help users and service suppliers benchmark work efficiency for software maintenance, and to clarify the relationships between software quality, work efficiency, and unit cost of staff, we used a dataset that includes 134 data points collected by the Economic Research Association in 2012, and analyzed the factors that affected the work efficiency of software maintenance. In the analysis, using a multiple regression model, we clarified the relationships between work efficiency and programming language and productivity factors. To analyze the influence to the quality, relationships of fault ratio was analyzed using correlation coefficients. The programming language and productivity factors affect work efficiency. Higher work efficiency and higher unit cost of staff do not affect the quality of software maintenance.
Kenichi ONO Masateru TSUNODA Akito MONDEN Kenichi MATSUMOTO
When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.