Toshiki HIRAO Raula GAIKOVINA KULA Akinori IHARA Kenichi MATSUMOTO
Modern code review is a well-known practice to assess the quality of software where developers discuss the quality in a web-based review tool. However, this lightweight approach may risk an inefficient review participation, especially when comments becomes either excessive (i.e., too many) or underwhelming (i.e., too few). In this study, we investigate the phenomena of reviewer commenting. Through a large-scale empirical analysis of over 1.1 million reviews from five OSS systems, we conduct an exploratory study to investigate the frequency, size, and evolution of reviewer commenting. Moreover, we also conduct a modeling study to understand the most important features that potentially drive reviewer comments. Our results find that (i) the number of comments and the number of words in the comments tend to vary among reviews and across studied systems; (ii) reviewers change their behaviours in commenting over time; and (iii) human experience and patch property aspects impact the number of comments and the number of words in the comments.
Jin MITSUGI Yuki SATO Yuusuke KAWAKITA Haruhisa ICHIKAWA
Backscatter wireless communications offer advantages such as batteryless operations, small form factor, and radio regulatory exemption sensors. The major challenge ahead of backscatter wireless communications is synchronized multicarrier data collection, which can be realized by rejecting mutual harmonics among backscatters. This paper analyzes the mutual interferences of digitally modulated multicarrier backscatter to find interferences from higher frequency subcarriers to lower frequency subcarriers, which do not take place in analog modulated multicarrier backscatters, is harmful for densely populated subcarriers. This reverse interference distorts the harmonics replica, deteriorating the performance of the existing method, which rejects mutual interference among subcarriers by 5dB processing gain. To solve this problem, this paper analyzes the relationship between subcarrier spacing and reverse interference, and reveals that an alternate channel spacing, with channel separation twice the bandwidth of a subcarrier, can provide reasonably dense subcarrier allocation and can alleviate reverse interference. The idea is examined with prototype sensors in a wired experiment and in an indoor propagation experiment. The results reveal that with alternate channel spacing, the reverse interference practically becomes negligible, and the existing interference rejection method achieves the original processing gain of 5dB with one hundredth packet error rate reduction.
Ryo ISHIZUKA Naohiko TSUDA Hironori WASHIZAKI Yoshiaki FUKAZAWA Shunsuke SUGIMURA Yuichiro YASUDA
Deterioration of software quality developed by multiple organizations has become a serious problem. To predict software degradation after an organizational change, this paper investigates the influence of quality deterioration on software metrics by analyzing three software projects. To detect factors indicating a low evolvability, we focus on the relationships between the change in software metric values and refactoring tendencies. Refactoring after an organization change impacts the quality.
Biao WU Xiaoan BAO Na ZHANG Hiromu MORITA Mitsuru NAKATA Qi-Wei GE
Software testing is an important problem to design a large software system and it is difficult to be solved due to its computational complexity. We try to use program nets to approach this problem. As the first step towards solving software testing problem, this paper provides a technique to generate subnets of a program net and applies this technique to software testing. Firstly, definitions and properties of program nets are introduced based on our previous works, and the explanation of software testing problem is given. Secondly, polynomial algorithms are proposed to generate subnets that can cover all the given program net. Finally, a case study is presented to show how to find subnets covering a given program net by using the proposed algorithms, as well as to show the input test data of the program net for software testing.
Jung-Been LEE Taek LEE Hoh Peter IN
Mining software artifacts is a useful way to understand the source code of software projects. Topic modeling in particular has been widely used to discover meaningful information from software artifacts. However, software artifacts are unstructured and contain a mix of textual types within the natural text. These software artifact characteristics worsen the performance of topic modeling. Among several natural language pre-processing tasks, removing stop words to reduce meaningless and uninteresting terms is an efficient way to improve the quality of topic models. Although many approaches are used to generate effective stop words, the lists are outdated or too general to apply to mining software artifacts. In addition, the performance of the topic model is sensitive to the datasets used in the training for each approach. To resolve these problems, we propose an automatic stop word generation approach for topic models of software artifacts. By measuring topic coherence among words in the topic using Pointwise Mutual Information (PMI), we added words with a low PMI score to our stop words list for every topic modeling loop. Through our experiment, we proved that our stop words list results in a higher performance of the topic model than lists from other approaches.
Kozo OKANO Satoshi HARAUCHI Toshifusa SEKIZAWA Shinpei OGATA Shin NAKAJIMA
Java is one of important program language today. In Java, in order to build sound software, we have to carefully implement two fundamental methods hashCode and equals. This requirement, however, is not easy to follow in real software development. Some existing studies for ensuring the correctness of these two methods rely on static analysis, which are limited to loop-free programs. This paper proposes a new solution to this important problem, using software analysis workbench (SAW), an open source tool. The efficiency is evaluated through experiments. We also provide a useful situation where cost of regression testing is reduced when program refactoring is conducted.
Hoang-Viet TRAN Ngoc Hung PHAM Viet Ha NGUYEN
Since software becomes more complex during its life cycle, the verification cost becomes higher, especially for such methods which are using model checking in general and assume-guarantee reasoning in specific. To address the problem of reducing the assume-guarantee verification cost, this paper presents a method to generate locally minimum and strongest assumptions for verification of component-based software. For this purpose, we integrate a variant of membership queries answering technique to an algorithm which considers candidate assumptions that are smaller and stronger first, larger and weaker later. Because the algorithm stops as soon as it reaches a conclusive result, the generated assumptions are the locally minimum and strongest ones. The correctness proof of the proposed algorithm is also included in the paper. An implemented tool, test data, and experimental results are presented and discussed.
Cheng LUO Wei CAO Lingli WANG Philip H. W. LEONG
With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
Chaman WIJESIRIWARDANA Prasad WIMALARATNE
This paper presents a concept of a domain-specific framework for software analytics by enabling querying, modeling, and integration of heterogeneous software repositories. The framework adheres to a multi-layered abstraction mechanism that consists of domain-specific operators. We showcased the potential of this approach by employing a case study.
Takahiro HIRAYAMA Masahiro JIBIKI Hiroaki HARAI
Software-defined networking (SDN) technology enables us to flexibly configure switches in a network. Previously, distributed SDN control methods have been discussed to improve their scalability and robustness. Distributed placement of controllers and backing up each other enhance robustness. However, these techniques do not include an emergency measure against large-scale failures such as network separation induced by disasters. In this study, we first propose a network partitioning method to create a robust control plane (C-Plane) against large-scale failures. In our approach, networks are partitioned into multiple sub-networks based on robust topology coefficient (RTC). RTC denotes the probability that nodes in a sub-network isolate from controllers when a large-scale failure occurs. By placing a local controller onto each sub-network, 6%-10% of larger controller-switch connections will be retained after failure as compared to other approaches. Furthermore, we discuss reactive emergency reconstruction of a distributed SDN C-plane. Each node detects a disconnection to its controller. Then, C-plane will be reconstructed by isolated switches and managed by the other substitute controller. Meanwhile, our approach reconstructs C-plane when network connectivity recovers. The main and substitute controllers detect network restoration and merge their C-planes without conflict. Simulation results reveal that our proposed method recovers C-plane logical connectivity with a probability of approximately 90% when failure occurs in 100 node networks. Furthermore, we demonstrate that the convergence time of our reconstruction mechanism is proportional to the network size.
Xiaoqiong ZHAO Shanping LI Huan YU Ye WANG Weiwei QIU
Background: The applying of third-party libraries is an integral part of many applications. But the libraries choosing is time-consuming even for experienced developers. The automated recommendation system for libraries recommendation is widely researched to help developers to choose libraries. Aim: from software engineering aspect, our research aims to give developers a reliable recommended list of third-party libraries at the early phase of software development lifecycle to help them build their development environment faster; and from technical aspect, our research aims to build a generalizable recommendation system framework which combines collaborative filtering and topic modeling techniques, in order to improve the performance of libraries recommendation significantly. Our works on this research: 1) we design a hybrid methodology to combine collaborative filtering and LDA text mining technology; 2) we build a recommendation system framework successfully based on the above hybrid methodology; 3) we make a well-designed experiment to validate the methodology and framework which use the data of 1,013 mobile application projects; 4) we do the evaluation for the result of the experiment. Conclusions: 1) hybrid methodology with collaborative filtering and LDA can improve the performance of libraries recommendation significantly; 2) based on the hybrid methodology, the framework works very well on the libraries recommendation for helping developers' libraries choosing. Further research is necessary to improve the performance of the libraries recommendation including: 1) use more accurate NLP technologies improve the correlation analysis; 2) try other similarity calculation methodology for collaborative filtering to rise the accuracy; 3) on this research, we just bring the time-series approach to the framework and make an experiment as comparative trial, the result shows that the performance improves continuously, so in further research we plan to use time-series data-mining as the basic methodology to update the framework.
Yutaro HAYAKAWA Kenichi YASUKATA Jin NAKAZAWA Michio HONDA
Increasing hardware resources, such as multi-core and multi-socket CPUs, memory capacity and high-speed NICs, impose significant challenges on Network Function Virtualization (NFV) backends. They increase the potential numbers of per-server NFs or tenants, which requires a packet switching architecture that is not only scalable to large number of virtual ports, but also robust to attacks on the data plane. This is a real problem; a recent study has reported that Open vSwitch, a widely used software switch, had a buffer-overflow bug in its data plane that results the entire SDN domain to be hijacked by worms propagated in the network. In order to address this problem, we propose REdge. It scales to thousands of virtual ports or NFs (as opposed to hundreds in the current state-of-the art), and protect modular, flexible packet switching logic against various bugs, such as buffer overflow and other unexpected operations using static program checking. When 2048 NFs are active and packets are distributed to them based on the MAC or IP addresses, REdge achieves 3.16 Mpps or higher packet forwarding rates for 60 byte packets and achieves the wire rate for 1500 byte packets in the 25 Gbps link.
Modern file systems, such as ext4, btrfs, and XFS, are evolving and enable the introduction of new features to meet ever-changing demands and improve reliability. File system developers are struggling to eliminate all software bugs, but the operating system community points out that file systems are a hotbed of critical software bugs. This paper analyzes the code coverage of xfstests, a widely used suite of file system tests, on three major file systems (ext4, btrfs, and XFS). The coverage is 72.34%, and the uncovered code runs into 23,232 lines of code. To understand why the code coverage is low, the uncovered code is manually examined line by line. We identified three major causes, peculiar to file systems, that hinder higher coverage. First, covering all the features is difficult because each file system provides a wide variety of file-system specific features, and some features can be tested only on special storage devices. Second, covering all the execution paths is difficult because they depend on file system configurations and internal on-disk states. Finally, the code for maintaining backward-compatibility is executed only when a file system encounters old formats. Our findings will help file system developers improve the coverage of test suites and provide insights into fostering the development of new methodologies for testing file systems.
Zi Hao ONG Takahide SATO Satomi OGAWA
A design method of the differential N-path filter with sampling computation is proposed. It enables the scale of the whole filter to be reduced by approximately half for easier realization. On top of that, the proposed method offers the ability to eliminate the harmonic passbands of the clock frequency and an increase of harmonic rejection. By using the proposed method, previous work involving an 8-path filter can be reduced to 5-path. The proposed differential 5-path filter reduces the scale of the circuit and at the same time has the performance of a 10-path filter from previous work. An example of differential 7-path filter using the same proposed design method is also stated in comparison of the differential 5-path filter. The differential 7-path filter offers the ability to eliminate all the passbands below 10 times the clock frequency with a tradeoff of an increase in circuit scale.
Haijin JI Song HUANG Xuewei LV Yaning WU Yuntian FENG
Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.
Naoto ISHIDA Takashi ISHIO Yuta NAKAMURA Shinji KAWAGUCHI Tetsuya KANDA Katsuro INOUE
Defects in spacecraft software may result in loss of life and serious economic damage. To avoid such consequences, the software development process incorporates code review activity. A code review conducted by a third-party organization independently of a software development team can effectively identify defects in software. However, such review activity is difficult for third-party reviewers, because they need to understand the entire structure of the code within a limited time and without prior knowledge. In this study, we propose a tool to visualize inter-module dataflow for source code of spacecraft software systems. To evaluate the method, an autonomous rover control program was reviewed using this visualization. While the tool does not decreases the time required for a code review, the reviewers considered the visualization to be effective for reviewing code.
Kernel updates are a part of daily life in contemporary computer systems. They usually require an OS reboot that involves restarting not only the kernel but also all of the running applications, causing downtime that can disrupt software services. This downtime issue has been tackled by numerous approaches. Although dynamic translation of the running kernel image, which is a representative approach, can conduct kernel updates at runtime, its applicability is inherently limited. This paper describes Dwarf, which shortens downtime during kernel updates and covers more types of updates. Dwarf launches the newer kernel in the background on the same physical machine and forces the kernel to inherit the running states of the older kernel. We implemented a prototype of Dwarf on Xen 4.5.2, Linux 2.6.39, Linux 3.18.35, and Linux 4.1.6. Also, we conducted experiments using six applications, such as Apache, MySQL, and memcached, and the results demonstrate that Dwarf's downtime is 1.8 seconds in the shortest case and up to 10× shorter than that of the normal OS reboot.
The security of a software program critically depends on the prevention of vulnerabilities in the source code; however, conventional computer programs lack the ability to identify vulnerable code in another program. Our research was aimed at developing a technique capable of generating substitution code for the detection of buffer overflow vulnerability in C/C++ programs. The technique automatically verifies and sanitizes code instrumentation by comparing the result of each candidate variable with that expected from the input data. Our results showed that statements containing buffer overflow vulnerabilities could be detected and prevented by using a substitution variable and by sanitizing code vulnerabilities based on the size of the variables. Thus, faults can be detected prior to execution of the statement, preventing malicious access. Our approach is particularly useful for enhancing software security monitoring, and for designing retrofitting techniques in applications.
Yasutaka KAMEI Takahiro MATSUMOTO Kazuhiro YAMASHITA Naoyasu UBAYASHI Takashi IWASAKI Shuichi TAKAYAMA
Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects.
Chaman WIJESIRIWARDANA Prasad WIMALARATNE
Mining software repositories allow software practitioners to improve the quality of software systems and to support maintenance based on historical data. Such data is scattered across autonomous and heterogeneous information sources, such as version control, bug tracking and build automation systems. Despite having many tools to track and measure the data originated from such repositories, software practitioners often suffer from a scarcity of the techniques necessary to dynamically leverage software repositories to fulfill their complex information needs. For example, answering a question such as “What is the number of commits between two successful builds?” requires tiresome manual inspection of multiple repositories. As a solution, this paper presents a conceptual framework and a proof of concept visual query interface to satisfy distinct software quality related information needs of software practitioners. The data originated from repositories is integrated and analyzed to perform systematic investigations, which helps to uncover hidden relationships between software quality and trends of software evolution. This approach has several significant benefits such as the ability to perform real-time analyses, the ability to combine data from various software repositories and generate queries dynamically. The framework evaluated with 31 subjects by using a series of questions categorized into three software evolution scenarios. The evaluation results evidently show that our framework surpasses the state of the art tools in terms of correctness, time and usability.