1-13hit |
Yan LEI Min ZHANG Bixin LI Jingan REN Yinhua JIANG
Many recent studies have focused on leveraging rich information types to increase useful information for improving fault localization effectiveness. However, they rarely investigate the impact of information richness on fault localization to give guidance on how to enrich information for improving localization effectiveness. This paper presents the first systematic study to fill this void. Our study chooses four representative information types and investigates the relationship between their richness and the localization effectiveness. The results show that information richness related to frequency execution count involves a high risk of degrading the localization effectiveness, and backward slice is effective in improving localization effectiveness.
Xi CHANG Zhuo ZHANG Yan LEI Jianjun ZHAO
Concurrency bugs do significantly affect system reliability. Although many efforts have been made to address this problem, there are still many bugs that cannot be detected because of the complexity of concurrent programs. Compared with atomicity violations, order violations are always neglected. Efficient and effective approaches to detecting order violations are therefore in urgent need. This paper presents a bidirectional predictive trace analysis approach, BIPED, which can detect order violations in parallel based on a recorded program execution. BIPED collects an expected-order execution trace into a layered bidirectional prediction model, which intensively represents two types of expected-order data flows in the bottom layer and combines the lock sets and the bidirectionally order constraints in the upper layer. BIPED then recognizes two types of candidate violation intervals driven by the bottom-layer model and then checks these recognized intervals bidirectionally based on the upper-layer constraint model. Consequently, concrete schedules can be generated to expose order violation bugs. Our experimental results show that BIPED can effectively detect real order violation bugs and the analysis speed is 2.3x-10.9x and 1.24x-1.8x relative to the state-of-the-art predictive dynamic analysis approaches and hybrid model based static prediction analysis approaches in terms of order violation bugs.
Ping ZENG Qingping TAN Haoyu ZHANG Xiankai MENG Zhuo ZHANG Jianjun XU Yan LEI
The deep neural named entity recognition model automatically learns and extracts the features of entities and solves the problem of the traditional model relying heavily on complex feature engineering and obscure professional knowledge. This issue has become a hot topic in recent years. Existing deep neural models only involve simple character learning and extraction methods, which limit their capability. To further explore the performance of deep neural models, we propose two character feature learning models based on convolution neural network and long short-term memory network. These two models consider the local semantic and position features of word characters. Experiments conducted on the CoNLL-2003 dataset show that the proposed models outperform traditional ones and demonstrate excellent performance.
Zhuo ZHANG Yan LEI Qingping TAN Xiaoguang MAO Ping ZENG Xi CHANG
Fault localization is essential for solving the issue of software faults. Aiming at improving fault localization, this paper proposes a deep learning-based fault localization with contextual information. Specifically, our approach uses deep neural network to construct a suspiciousness evaluation model to evaluate the suspiciousness of a statement being faulty, and then leverages dynamic backward slicing to extract contextual information. The empirical results show that our approach significantly outperforms the state-of-the-art technique Dstar.
Zhuo ZHANG Yan LEI Jianjun XU Xiaoguang MAO Xi CHANG
Existing fault localization based on neural networks utilize the information of whether a statement is executed or not executed to identify suspicious statements potentially responsible for a failure. However, the information just shows the binary execution states of a statement, and cannot show how important a statement is in executions. Consequently, it may degrade fault localization effectiveness. To address this issue, this paper proposes TFIDF-FL by using term frequency-inverse document frequency to identify a high or low degree of the influence of a statement in an execution. Our empirical results on 8 real-world programs show that TFIDF-FL significantly improves fault localization effectiveness.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
One of the patterns that the design of parallel file systems has to solve stems from the difficulty of handling the metadata-intensive I/O generated by parallel applications accessing a single large directory. We demonstrate a middleware design called SFS to support existing parallel file systems for distributed and scalable directory service. SFS distributes directory entries over data servers instead of metadata servers to offer increased scalability and performance. Firstly, SFS exploits an adaptive directory partitioning based on extendible hashing to support concurrent and unsynchronized partition splitting. Secondly, SFS describes an optimization based on recursive split-ordering that emphasizes speeding up the splitting process. Thirdly, SFS applies a write-optimized index structure to convert slow, small, random metadata updates into fast, large, sequential writes. Finally, SFS gracefully tolerates stale mapping at the clients while maintaining the correctness and consistency of the system. Our performance results on a cluster of 32-servers show our implementation can deliver more than 250,000 file creations per second on average.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
File system workloads are increasing write-heavy. The growing capacity of RAM in modern nodes allows many reads to be satisfied from memory while writes must be persisted to disk. Today's sophisticated local file systems like Ext4, XFS and Btrfs optimize for reads but suffer from workloads dominated by microdata (including metadata and tiny files). In this paper we present an LSM-tree-based file system, RFS, which aims to take advantages of the write optimization of LSM-tree to provide enhanced microdata performance, while offering matching performance for large files. RFS incrementally partitions the namespace into several metadata columns on a per-directory basis, preserving disk locality for directories and reducing the write amplification of LSM-trees. A write-ordered log-structured layout is used to store small files efficiently, rather than embedding the contents of small files into inodes. We also propose an optimization of global bloom filters for efficient point lookups. Experiments show our library version of RFS can handle microwrite-intensive workloads 2-10 times faster than existing solutions such as Ext4, Btrfs and XFS.
Ang LI Xiaoguang MAO Yan LEI Tao JI
Fault localization is essential for conducting effective program repair. However, preliminary studies have shown that existing fault localization approaches do not take the requirements of automatic repair into account, and therefore restrict the repair performance. To address this issue, this paper presents the first study on designing fault localization approaches for automatic program repair, that is, we propose a fault localization approach using failure-related contexts in order to improve automatic program repair. The proposed approach first utilizes program slicing technique to construct a failure-related context, then evaluates the suspiciousness of each element in this context, and finally transfers the result of evaluation to automatic program repair techniques for performing repair on faulty programs. The experimental results demonstrate that the proposed approach is effective to improve automatic repair performance.
Chengsong WANG Xiaoguang MAO Yan LEI Peng ZHANG
In recent years, hybrid typestate analysis has been proposed to eliminate unnecessary monitoring instrumentations for runtime monitors at compile-time. Nop-shadows Analysis (NSA) is one of these hybrid typestate analyses. Before generating residual monitors, NSA performs the data-flow analysis which is intra-procedural flow-sensitive and partially context-sensitive to improve runtime performance. Although NSA is precise, there are some cases on which it has little effects. In this paper, we propose three optimizations to further improve the precision of NSA. The first two optimizations try to filter interferential states of objects when determining whether a monitoring instrumentation is necessary. The third optimization refines the inter-procedural data-flow analysis induced by method invocations. We have integrated our optimizations into Clara and conducted extensive experiments on the DaCapo benchmark. The experimental results demonstrate that our first two optimizations can further remove unnecessary instrumentations after the original NSA in more than half of the cases, without a significant overhead. In addition, all the instrumentations can be removed for two cases, which implies the program satisfy the typestate property and is free of runtime monitoring. It comes as a surprise to us that the third optimization can only be effective on 8.7% cases. Finally, we analyze the experimental results and discuss the reasons why our optimizations fail to further eliminate unnecessary instrumentations in some special situations.
Jiang WU Jianjun XU Xiankai MENG Yan LEI
We propose a new framework named ROICF based on reinforcement learning orienting reliable compilation optimization sequence generation. On the foundation of the LLVM standard compilation optimization passes, we can obtain specific effective phase ordering for different programs to improve program reliability.
Zhuo ZHANG Xiaoguang MAO Yan LEI Peng ZHANG
Existing fault localization approaches usually do not provide a context for developers to understand the problem. Thus, this paper proposes a novel approach using the dynamic backward slicing technique to enrich contexts for existing approaches. Our empirical results show that our approach significantly outperforms five state-of-the-art fault localization techniques.
Ziying DAI Xiaoguang MAO Yan LEI Xiaomin WAN Kerong BEN
A garbage collector relieves programmers from manual memory management and improves productivity and program reliability. However, there are many other finite system resources that programmers must manage by themselves, such as sockets and database connections. Growing resource leaks can lead to performance degradation and even program crashes. This paper presents the automatic resource collection approach called Resco (RESource COllector) to tolerate non-memory resource leaks. Resco prevents performance degradation and crashes due to resource leaks by two steps. First, it utilizes monitors to count resource consumption and request resource collections independently of memory usage when resource limits are about to be violated. Second, it responds to a resource collection request by safely releasing leaked resources. We implement Resco based on a Java Virtual Machine for Java programs. The performance evaluation against standard benchmarks shows that Resco has a very low overhead, around 1% or 3%. Experiments on resource leak bugs show that Resco successfully prevents most of these programs from crashing with little increase in execution time.
Yan LEI Xiaoguang MAO Ziying DAI Dengping WEI
At the stage of software debugging, the effective interaction between software debugging engineers and fault localization techniques can greatly improve fault localization performance. However, most fault localization approaches usually ignore this interaction and merely utilize the information from testing. Due to different goals of testing and fault localization, the lack of interaction may lead to the issue of information inadequacy, which can substantially degrade fault localization performance. In addition, human work is costly and error-prone. It is vital to study and simulate the pattern of debugging engineers as they apply their knowledge and experience to this interaction to promote fault localization effectiveness and reduce their workload. Thus this paper proposes an effective fault localization approach to simulate this interaction via feedback. Based on results obtained from fault localization techniques, this approach utilizes test data generation techniques to automatically produce feedback for interacting with these fault localization techniques, and then iterate this process to improve fault localization performance until a specific stopping condition is satisfied. Experiments on two standard benchmarks demonstrate the significant improvement of our approach over a promising fault localization technique, namely the spectrum-based fault localization technique.