Bodin CHINTHANET Raula GAIKOVINA KULA Rodrigo ELIZA ZAPATA Takashi ISHIO Kenichi MATSUMOTO Akinori IHARA
It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype, SōjiTantei, is evaluated in two ways (i) the accuracy when compared to a manual approach and (ii) a larger-scale analysis of 780 clients from 78 security vulnerability cases. The first evaluation shows that SōjiTantei has a high accuracy of 83.3%, with a speed of less than a second analysis per client. The second evaluation reveals that 68 out of the studied 78 vulnerabilities reported having at least one clean client. The study proves that automation is promising with the potential for further improvement.
Hiroto TANAKA Shinsuke MATSUMOTO Shinji KUSUMOTO
Over the past recent decades, numerous programming languages have expanded to embrace multi-paradigms such as the fusion of object-oriented and functional programming. For example, Java, one of the most famous object-oriented programming languages, introduced a number of functional idioms in 2014. This evolution enables developers to achieve various benefits from both paradigms. However, we do not know how Java developers use functional idioms actually. Additionally, the extent to which, while there are several criticisms against the idioms, the developers actually accept and/or use the idioms currently remains unclear. In this paper, we investigate the actual use status of three functional idioms (Lambda Expression, Stream, and Optional) in Java projects by mining 100 projects containing approximately 130,000 revisions. From the mining results, we determined that Lambda Expression is utilized in 16% of all the examined projects, whereas Stream and Optional are only utilized in 2% to 3% of those projects. It appears that most Java developers avoid using functional idioms just because of keeping compatibility Java versions, while a number of developers accept these idioms for reasons of readability and runtime performance improvements. Besides, when they adopt the idioms, Lambda Expression frequently consists of a single statement, and Stream is used to operate the elements of a collection. On the other hand, some developers implement Optional using deprecated methods. We can say that good usage of the idioms should be widely known among developers.
Kozo OKANO Satoshi HARAUCHI Toshifusa SEKIZAWA Shinpei OGATA Shin NAKAJIMA
Java is one of important program language today. In Java, in order to build sound software, we have to carefully implement two fundamental methods hashCode and equals. This requirement, however, is not easy to follow in real software development. Some existing studies for ensuring the correctness of these two methods rely on static analysis, which are limited to loop-free programs. This paper proposes a new solution to this important problem, using software analysis workbench (SAW), an open source tool. The efficiency is evaluated through experiments. We also provide a useful situation where cost of regression testing is reduced when program refactoring is conducted.
Shohei IKEDA Akinori IHARA Raula Gaikovina KULA Kenichi MATSUMOTO
Contemporary software projects often utilize a README.md to share crucial information such as installation and usage examples related to their software. Furthermore, these files serve as an important source of updated and useful documentation for developers and prospective users of the software. Nonetheless, both novice and seasoned developers are sometimes unsure of what is required for a good README file. To understand the contents of README, we investigate the contents of 43,900 JavaScript packages. Results show that these packages contain common content themes (i.e., ‘usage’, ‘install’ and ‘license’). Furthermore, we find that application-specific packages more frequently included content themes such as ‘options’, while library-based packages more frequently included other specific content themes (i.e., ‘install’ and ‘license’).
Yuta TAKATA Mitsuaki AKIYAMA Takeshi YAGI Takeo HARIU Kazuhiko OHKUBO Shigeki GOTO
Security researchers/vendors detect malicious websites based on several website features extracted by honeyclient analysis. However, web-based attacks continue to be more sophisticated along with the development of countermeasure techniques. Attackers detect the honeyclient and evade analysis using sophisticated JavaScript code. The evasive code indirectly identifies vulnerable clients by abusing the differences among JavaScript implementations. Attackers deliver malware only to targeted clients on the basis of the evasion results while avoiding honeyclient analysis. Therefore, we are faced with a problem in that honeyclients cannot analyze malicious websites. Nevertheless, we can observe the evasion nature, i.e., the results in accessing malicious websites by using targeted clients are different from those by using honeyclients. In this paper, we propose a method of extracting evasive code by leveraging the above differences to investigate current evasion techniques. Our method analyzes HTTP transactions of the same website obtained using two types of clients, a real browser as a targeted client and a browser emulator as a honeyclient. As a result of evaluating our method with 8,467 JavaScript samples executed in 20,272 malicious websites, we discovered previously unknown evasion techniques that abuse the differences among JavaScript implementations. These findings will contribute to improving the analysis capabilities of conventional honeyclients.
Tomomi HATANO Takashi ISHIO Joji OKADA Yuji SAKATA Katsuro INOUE
For the maintenance of a business system, developers must understand the business rules implemented in the system. One type of business rules defines computational business rules; they represent how an output value of a feature is computed from the valid inputs. Unfortunately, understanding business rules is a tedious and error-prone activity. We propose a program-dependence analysis technique tailored to understanding computational business rules. Given a variable representing an output, the proposed technique extracts the conditional statements that may affect the computation of the output. To evaluate the usefulness of the technique, we conducted an experiment with eight developers in one company. The results confirm that the proposed technique enables developers to accurately identify conditional statements corresponding to computational business rules. Furthermore, we compare the number of conditional statements extracted by the proposed technique and program slicing. We conclude that the proposed technique, in general, is more effective than program slicing.
Hao HAN Yinxing XUE Keizo OYAMA Yang LIU
The rendering mechanism plays an indispensable role in browser-based Web application. It generates active webpages dynamically and provides human-readable layout through template engines, which are used as a standard programming model to separate the business logic and data computations from the webpage presentation. The client-side rendering mechanism, owing to the advances of rich application technologies, has been widely adopted. The adoption of client side rendering brings not only various merits but also new problems. In this paper, we propose and construct “pagelet”, a segment-based template engine for developing flexible and extensible Web applications. By presenting principles, practice and usage experience of pagelet, we conduct a comprehensive analysis of possible advantages and disadvantages brought by client-side rendering mechanism from the viewpoints of both developers and end-users.
Kuo-Yi CHEN Chin-Yang LIN Tien-Yan MA Ting-Wei HOU
With more digital home appliances and network devices having OSGi as the software management platform, the power-saving capability of the OSGi platform has become a critical issue. This paper is aimed at improving the power-efficiency of the OSGi platform, i.e. reducing the energy consumption with minimum performance degradation. The key to this study is an efficient power-saving technique which exploits the runtime information already available in a Java virtual machine (JVM), the base software of the OSGi platform, to best determine the timing of performing DVFS (Dynamic Voltage and Frequency Scaling). This, technically, involves a phase detection scheme that identifies the memory phase of the OSGi-enabled device/server in a correct and almost effortless way. The overhead of the power-saving procedure is thus minimized, and the system performance is well maintained. We have implemented and evaluated the proposed power-saving approach on an OSGi server, where the Apache Felix OSGi implementation and the DaCapo benchmarks were applied. The results show that this approach can achieve real power-efficiency for the OSGi platform, in which the power consumption is significantly reduced and the performance remains highly competitive, compared with the other power-saving techniques.
This paper presents a method for comparing and detecting clones of Java programs by analyzing program stack flows. A stack flow denotes an operational behavior of a program by describing individual instructions and stack movements for performing specific operations. We analyze stack flows by simulating the operand stack movements during execution of a Java program. Two programs for detection of clones of Java programs are compared by matching similar pairs of stack flows in the programs. Experiments were performed on the proposed method and compared with the earlier approaches of comparing Java programs, the Tamada, k-gram, and stack pattern based methods. Their performance was evaluated with real-world Java programs in several categories collected from the Internet. The experimental results show that the proposed method is more effective than earlier methods of comparing and detecting clones of Java programs.
Marc DOMINGO-PRIETO Joan ARNEDO-MORENO
With the evolution of the P2P research field, new problems, such as those related with information security, have arisen. It is important to provide security mechanisms to P2P systems, since it has already become one of the key issues when evaluating them. However, even though many P2P systems have been adapted to provide a security baseline to their underlying applications, more advanced capabilities are becoming necessary. Specifically, privacy preservation and anonymity are deemed essential to make the information society sustainable. Unfortunately, sometimes, it may be difficult to attain anonymity unless it is included into the system's initial design. The JXTA open protocols specification is a good example of this kind of scenario. This work studies how to provide anonymity to JXTA's architecture in a feasible manner and proposes an extension which allows deployed services to process two-way messaging without disclosing the endpoints' identities to third parties.
Gregory BLANC Youki KADOBAYASHI
Modern web applications incorporate many programmatic frameworks and APIs that are often pushed to the client-side with most of the application logic while contents are the result of mashing up several resources from different origins. Such applications are threatened by attackers that often attempts to inject directly, or by leveraging a stepstone website, script codes that perform malicious operations. Web scripting based malware proliferation is being more and more industrialized with the drawbacks and advantages that characterize such approach: on one hand, we are witnessing a lot of samples that exhibit the same characteristics which make these easy to detect, while on the other hand, professional developers are continuously developing new attack techniques. While obfuscation is still a debated issue within the community, it becomes clear that, with new schemes being designed, this issue cannot be ignored anymore. Because many proposed countermeasures confess that they perform better on unobfuscated contents, we propose a 2-stage technique that first relieve the burden of obfuscation by emulating the deobfuscation stage before performing a static abstraction of the analyzed sample's functionalities in order to reveal its intent. We support our proposal with evidence from applying our technique to real-life examples and provide discussion on performance in terms of time, as well as possible other applications of proposed techniques in the areas of web crawling and script classification. Additionally, we claim that such approach can be generalized to other scripting languages similar to JavaScript.
Despite the prevalence of Java workloads across a variety of processor architectures, there is very little published data on the impact of the various processor design decisions on Java performance. We attribute the lack of data to the large design space resulting from the complexity of the modern superscalar processor and the additional complexities associated with executing Java bytecode using a virtual machine. To address this shortcoming, we use a statistically rigorous methodology to systematically quantify the the impact of the various processor microarchitecture parameters on Java execution performance. The adopted methodology enables efficient screening of significant factor effects in a large design space consisting of 35 factors (32-billion potential configurations) using merely 72 observations per benchmark application. We quantify and tabulate the significance of each of the 35 factors for 13 benchmark applications. While these tables provide various insights into Java performance, they consistently highlight the performance significance of the instruction delivery mechanism, especially the instruction cache and the ITLB design parameters. Furthermore, these tables enable the architect to identify processor bottlenecks for Java workloads by providing an estimate of the relative impact of various design decisions.
Chien-Tsun CHEN Yu Chin CHENG Chin-Yun HSIEH
Design by Contract (DBC), originated in the Eiffel programming language, is generally accepted as a practical method for building reliable software. Currently, however, few languages have built-in support for it. In recent years, several methods have been proposed to support DBC in Java. We compare eleven DBC tools for Java by analyzing their impact on the developer's programming activities, which are characterized by seven quality attributes identified in this paper. It is shown that each of the existing tools fails to achieve some of the quality attributes. This motivates us to develop ezContract, an open source DBC tool for Java that achieves all of the seven quality attributes. ezContract achieves streamlined integration with the working environment. Notably, standard Java language is used and advanced IDE features that work for standard Java programs can also work for the contract-enabled programs. Such features include incremental compilation, automatic refactoring, and code assist.
Hyun-il LIM Heewan PARK Seokwoo CHOI Taisook HAN
A software birthmark means the inherent characteristics of a program that can be used to identify the program. A comparison of such birthmarks facilitates the detection of software theft. In this paper, we propose a static Java birthmark based on a set of stack patterns, which reflect the characteristic of Java applications. A stack pattern denotes a sequence of bytecodes that share their operands through the operand stack. A weight scheme is used to balance the influence of each bytecode in a comparison of the birthmarks. We evaluate the proposed birthmark with respect to two properties required for a birthmark: credibility and resilience. The empirical results show that the proposed birthmark is highly credible and resilient to program transformation. We also compare the proposed birthmark with existing birthmarks, such as that of Tamada et al. and the k-gram birthmark. The experimental results show that the proposed birthmark is more stable than the birthmarks in terms of resilience to program transformation. Thus, the proposed birthmark can provide more reliable evidence of software theft when the software is modified by someone other than author.
Youngsun HAN Seok Joong HWANG Seon Wook KIM
In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.
Sunae SEO Youil KIM Hyun-Goo KANG Taisook HAN
Correctness of Java programs is important because they are executed in distributed computing environments. The object initialization scheme in the Java programming language is complicated, and this complexity may lead to undesirable semantic bugs. Various tools have been developed for detecting program patterns that might cause errors during program execution. However, current tools cannot identify code patterns in which an uninitialized field is accessed when an object is initialized. We refer to such erroneous patterns as uninitialized field references. In this paper, we propose a static pattern detection algorithm for identifying uninitialized field references. We design a sound analysis for this problem and implement an analyzer using the Soot framework. In addition, we apply our algorithm to some real Java applications. From the experiments, we identify 12 suspicious field references in the applications, and among those we find two suspected errors by manual inspection.
Tetsuya YAMADA Naohiko IRIE Takanobu TSUNODA Takahiro IRITA Kenji KITAGAWA Ryohei YOSHIDA Keisuke TOYAMA Motoaki SATOYAMA
We have developed a hardware accelerator for Java platforms, integrated on a SuperH microprocessor core, using a 130-nm CMOS process. The Java accelerator, a bytecode translation unit (BTU), is tightly coupled with the CPU to share resources. The BTU supports 159 basic bytecodes and 5 or 6 optional bytecodes. It supports both connected device configuration (CDC) 1.0 and connected limited device configuration (CLDC) 1.0.4 technologies. The BTU corresponds to the dual-issued superscalar CPU and applies a new method, control-sharing. With this method, the BTU always grasps the pipeline status of the CPU, and the Java program is processed by both the BTU and the CPU. To implement this method, we developed some acceleration techniques: fast branch requests, enhanced CPU instructions, Java runtime exception detection hardware, and fewer overhead cycles of handover between the BTU and the CPU. In particular, the BTU can detect Java runtime exceptions in parallel with other processing, such as an array access. With previous methods, there is a disadvantage in that CPU efficiency decreases for Java-specific processing, such as array index bounds checking. The sample chip was fabricated in Renesas 130-nm, five-layer Cu, dual-vth low-power CMOS technology. The chip runs at 216 MHz and 1.2 V. The BTU has 75 kG. The benchmark on an evaluation board showed 6.55 embedded caffeine marks (ECM)/MHz on the CLDC 1.0.4 configuration, a tenfold speed increase without the BTU for roughly the same power consumption. In other words, power savings of 90 percent with the same performance were achieved.
Thepparit BANDITWATTANAWONG Soichiro HIDAKA Hironori WASHIZAKI Katsumi MARUYAMA
Object caching is a common feature in the scalable distributed object systems. Fine-grained replication optimizes the performance and resource utilization in object caching by enabling a remote object-oriented application to be partially and incrementally on-demand replicated in units of cluster. Despite these benefits, the lack of common and simple implementation framework makes the fine-grained replication scheme not extensively used. This paper proposes the novel frameworks for dynamic, transparent, partial and automatically incremental replication of distributed Java objects based on three techniques that are lazy-object creation, proxy and hook. One framework enables the fine-grained replication of server-side stateful in-memory application, and the other framework enables the fine-grained replication of server-side stateless in-memory application, client-side program, or standalone application. The experimental evaluation demonstrates that the efficiency in terms of response time of both frameworks are relatively practical to the extent of a local method invocation.
Katsuhisa MARUYAMA Shinichiro YAMAMOTO
Recent IDEs have become more extensible tool platforms but do not concern themselves with how other tools running on them collaborate with each other. They compel developers to use proprietary representations or the classical abstract syntax tree (AST) to build source code tools. Although these representations contain sufficient information, they are neither portable nor extensible. This paper proposes a tool platform that manages commonly used, fined-grained, information about Java source code by using an XML representation. Our representation is suitable for developing tools which browse and manipulate actual source code, since the original code is annotated with tags based on its structure and retained within the tags. Additionally, it exposes information resulting from global semantic analysis, which is never provided by the typical AST. Our proposed platform allows the developers to extend the representation for the purpose of sharing or exchanging various kinds of information about the source code, and also enables them to build new tools by using existing XML utilities.
Hironori WASHIZAKI Daiki HOSHI Yoshiaki FUKAZAWA
A component connection enables a component to use the functionality of other components directly, without generating adapters or other mechanisms at run-time. In conventional component connection models, the connection between components, particularly third-party components, is very costly for code reuse because the component source code must be modified if the types of requester-side and provider-side are different. This paper proposes a new component model, built upon an existing component architecture, which abandons a component service type and connects components based on a method type collection of the provider and requester components. Our model enables flexible connections owing to relaxed component matching, in which the system that implements our model automatically converts values of parameters, return values, and exceptions between required methods and provided ones within a well-defined range. As a result of experimental evaluations, it is found that our model is superior to conventional models in terms of the component-use cost and the capability of changing connections.