1-12hit |
Modern memory devices such as DRAM are prone to errors that occur because of unintended bit flips during their operation. Since memory errors severely impact in-memory key-value stores (KVSes), software mechanisms for hardening them against memory errors are being explored. However, it is hard to efficiently test the memory error handling code due to its characteristics: the code is event-driven, the handlers depend on the memory object, and in-memory KVSes manage various objects in huge memory space. This paper presents MemFI that supports runtime tests for the memory error handlers of in-memory KVSes. Our approach performs the software fault injection of memory errors at the memory object level to trigger the target handler while smoothly carrying out tests on the same running state. To show the effectiveness of MemFI, we integrate error handling mechanisms into a real-world in-memory KVS, memcached 1.6.9 and Redis 6.2.7, and check their behavior using the MemFI prototypes. The results show that the MemFI-based runtime test allows us to check the behavior of the error handling mechanisms. We also show its efficiency by comparing it to other fault injection approaches based on a trial model.
The issue of copying values or references has historically been studied for managing memory objects, especially in distributed systems. In this paper, we explore a new topic on copying values v.s. references, for memory page compaction on virtualized systems. Memory page compaction moves target physical pages to a contiguous memory region at the operating system kernel level to create huge pages. Memory virtualization provides an opportunity to perform memory page compaction by copying the references of the physical pages. That is, instead of copying pages' values, we can move guest physical pages by changing the mappings of guest-physical to machine-physical pages. The goal of this paper is a quantitative comparison between value- and reference-based memory page compaction. To do so, we developed a software mechanism that achieves memory page compaction by appropriately updating the references of guest-physical pages. We prototyped the mechanism on Linux 4.19.29 and the experimental results show that the prototype's page compaction is up to 78% faster and achieves up to 17% higher performance on the memory-intensive real-world applications as compared to the default value-copy compaction scheme.
Hiroshi YAMADA Shuntaro TONOSAKI Kenji KONO
Infrastructure as a Service (IaaS), a form of cloud computing, is gaining attention for its ability to enable efficient server administration in dynamic workload environments. In such environments, however, updating the software stack or content files of virtual machines (VMs) is a time-consuming task, discouraging administrators from frequently enhancing their services and fixing security holes. This is because the administrator has to upload the whole new disk image to the cloud platform via the Internet, which is not yet fast enough that large amounts of data can be transferred smoothly. Although the administrator can apply incremental updates directly to the running VMs, he or she has to carefully consider the type of update and perform operations on all running VMs, such as application restarts. This is a tedious and error-prone task. This paper presents a technique for synchronizing VMs with less time and lower administrative burden. We introduce the Virtual Disk Image Repository, which runs on the cloud platform and automatically updates the virtual disk image and the running VMs with only the incremental update information. We also show a mechanism that performs necessary operations on the running VM such as restarting server processes, based on the types of files that are updated. We implement a prototype on Linux 2.6.31.14 and Amazon Elastic Compute Cloud. An experiment shows that our technique can synchronize VMs in an order-of-magnitude shorter time than the conventional disk-image-based VM method. Also, we discuss limitations of our technique and some directions for more efficient VM updates.
Operating system (OS) reboots are an essential part of updating kernels and applications on laptops and desktop PCs. Long downtime during OS reboots severely disrupts users' computational activities. This long disruption discourages the users from conducting OS reboots, failing to enforce them to conduct software updates. Although the dynamic updatable techniques have been widely studied, making the system “reboot-free” is still difficult due to their several limitations. As a result, users cannot benefit from new functionality or better performance, and even worse, unfixed vulnerabilities can be exploited by attackers. This paper presents ShadowReboot, a virtual machine monitor (VMM)-based approach that shortens downtime of OS reboots in software updates. ShadowReboot conceals OS reboot activities from user's applications by spawning a VM dedicated to an OS reboot and systematically producing the rebooted state where the updated kernel and applications are ready for use. ShadowReboot provides an illusion to the users that the guest OS travels forward in time to the rebooted state. ShadowReboot offers the following advantages. It can be used to apply patches to the kernels and even system configuration updates. Next, it does not require any special patch requiring detailed knowledge about the target kernels. Lastly, it does not require any target kernel modification. We implemented a prototype in VirtualBox 4.0.10 OSE. Our experimental results show that ShadowReboot successfully updated software on unmodified commodity OS kernels and shortened the downtime of commodity OS reboots on five Linux distributions (Fedora, Ubuntu, Gentoo, Cent, and SUSE) by 91 to 98%.
Shigeo URUSHIDANI Shunji ABE Kenjiro YAMANAKA Kento AIDA Shigetoshi YOKOYAMA Hiroshi YAMADA Motonori NAKAMURA Kensuke FUKUDA Michihiro KOIBUCHI Shigeki YAMADA
This paper describes an architectural design and related services of a new Japanese academic backbone network, called SINET5, which will be launched in April 2016. The network will cover all 47 prefectures with 100-Gigabit Ethernet technology and connect each pair of prefectures with a minimized latency. This will enable users to leverage evolving cloud-computing powers as well as draw on a high-performance platform for data-intensive applications. The transmission layer will form a fully meshed, SDN-friendly, and reliable network. The services will evolve to be more dynamic and cloud-oriented in response to user demands. Cyber-security measures for the backbone network and tools for performance acceleration and visualization are also discussed.
Reboot-based recovery is a simple but powerful method to recover applications from failures and unstable states. Reboot-based recovery faces a challenge to apply it to a new type of applications, in-memory databases (DBs). Unlike legacy applications, since rebooting in-memory DBs loses memory objects including key-value pairs and DB blocks, it is required to restore them, causing severe performance degradation after the reboot. This paper presents an approach that allows us to perform reboot-based recovery of in-memory DBs with lower performance degradation. Our key insight is to decouple data content objects from all the memory objects. Our approach treats data items as data content objects, preserves data content objects on memory across reboots, and enforces restarted in-memory DBs to attach them. To show the effectiveness of our approach, we elaborate the idea into two real-world DBs, MyRocks and memcached. The prototypes successfully mitigate performance degradation after their reboot-based recovery.
In infrastructure-as-a-service platforms, cloud users can adjust their database (DB) service scale to dynamic workloads by changing the number of virtual machines running a DB management system (DBMS), called DBMS instances. Replicating a DBMS instance is a non-trivial task since DBMS replication is time-consuming due to the trend that cloud vendors offer high-spec DBMS instances. This paper presents BalenaDB, which performs urgent DBMS replication for handling sudden workload increases. Unlike convectional replication schemes that implicitly assume DBMS replicas are generated on remote machines, BalenaDB generates a warmed-up DBMS replica on an instance running on the local machine where the master DBMS instance runs, by leveraging the master DBMS resources. We prototyped BalenaDB on MySQL 5.6.21, Linux 3.17.2, and Xen 4.4.1. The experimental results show that the time for generating the warmed-up DBMS replica instance on BalenaDB is up to 30× shorter than an existing DBMS instance replication scheme, achieving significantly efficient memory utilization.
Shuhei ENOMOTO Hiroki KUZUNO Hiroshi YAMADA
CPU flush instruction-based cache side-channel attacks (cache instruction attacks) target a wide range of machines. For instance, Meltdown / Spectre combined with FLUSH+RELOAD gain read access to arbitrary data in operating system kernel and user processes, which work on cloud virtual machines, laptops, desktops, and mobile devices. Additionally, fault injection attacks use a CPU cache. For instance, Rowhammer, is a cache instruction attack that attempts to obtain write access to arbitrary data in physical memory, and affects machines that have DDR3. To protect against existing cache instruction attacks, various existing mechanisms have been proposed to modify hardware and software aspects; however, when latest cache instruction attacks are disclosed, these mechanisms cannot prevent these. Moreover, additional countermeasure requires long time for the designing and developing process. This paper proposes a novel mechanism termed FlushBlocker to protect against all types of cache instruction attacks and mitigate against cache instruction attacks employ latest side-channel vulnerability until the releasing of additional countermeasures. FlushBlocker employs an approach that restricts the issuing of cache flush instructions and the attacks that lead to failure by limiting control of the CPU cache. To demonstrate the effectiveness of this study, FlushBlocker was implemented in the latest Linux kernel, and its security and performance were evaluated. Results show that FlushBlocker successfully prevents existing cache instruction attacks (e.g., Meltdown, Spectre, and Rowhammer), the performance overhead was zero, and it was transparent in real-world applications.
Hiroki SHIRAYANAGI Hiroshi YAMADA Kenji KONO
Current network elements consume 10-20% of the total power in data centers. Today's network elements are not energy-proportional and consume a constant amount of energy regardless of the amount of traffic. Thus, turning off unused network switches is the most efficient way of reducing the energy consumption of data center networks. This paper presents Honeyguide, an energy optimizer for data center networks that not only turns off inactive switches but also increases the number of inactive switches for better energy-efficiency. To this end, Honeyguide combines two techniques: 1) virtual machine (VM) and traffic consolidation, and 2) a slight extension to the existing tree-based topologies. Honeyguide has the following advantages. The VM consolidation, which is gracefully combined with traffic consolidation, can handle severe requirements on fault tolerance. It can be introduced into existing data centers without replacing the already-deployed tree-based topologies. Our simulation results demonstrate that Honeyguide can reduce the energy consumption of network elements better than the conventional VM migration schemes, and the savings are up to 7.8% in a fat tree with k=12.
Yusuke SUZUKI Hiroshi YAMADA Shinpei KATO Kenji KONO
Graphics processing units (GPUs) have become an attractive platform for general-purpose computing (GPGPU) in various domains. Making GPUs a time-multiplexing resource is a key to consolidating GPGPU applications (apps) in multi-tenant cloud platforms. However, advanced GPGPU apps pose a new challenge for consolidation. Such highly functional GPGPU apps, referred to as GPU eaters, can easily monopolize a shared GPU and starve collocated GPGPU apps. This paper presents GLoop, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters. GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eaters' high functionality while proportionally scheduling them on a shared GPU in an isolated manner. We implemented a prototype of GLoop and ported eight GPU eaters on it. The experimental results demonstrate that our prototype successfully schedules the consolidated GPGPU apps on the basis of its scheduling policy and isolates resources among them.
Kernel updates are a part of daily life in contemporary computer systems. They usually require an OS reboot that involves restarting not only the kernel but also all of the running applications, causing downtime that can disrupt software services. This downtime issue has been tackled by numerous approaches. Although dynamic translation of the running kernel image, which is a representative approach, can conduct kernel updates at runtime, its applicability is inherently limited. This paper describes Dwarf, which shortens downtime during kernel updates and covers more types of updates. Dwarf launches the newer kernel in the background on the same physical machine and forces the kernel to inherit the running states of the older kernel. We implemented a prototype of Dwarf on Xen 4.5.2, Linux 2.6.39, Linux 3.18.35, and Linux 4.1.6. Also, we conducted experiments using six applications, such as Apache, MySQL, and memcached, and the results demonstrate that Dwarf's downtime is 1.8 seconds in the shortest case and up to 10× shorter than that of the normal OS reboot.
Naiwala P. CHANDRASIRI Ryuta SUZUKI Nobuyuki WATANABE Hiroshi YAMADA
Face perception and recognition have attracted more attention recently in multidisciplinary fields such as engineering, psychology, neuroscience, etc. with the advances in physical/physiological measurement and data analysis technologies. In this paper, our main interest is building computational models of human face recognition based on psychological experiments. We specially focus on modeling human face recognition characteristics of average face in the dimension of distinctiveness. Psychological experiments were carried out to measure distinctiveness of face images and their results are explained by computer analysis results of the images. Two psychological experiments, 1) Classical experiment of distinctiveness rating and, 2) Novel experiment of recognition of an average face were performed. In the later experiment, we examined on how the average face of two face images was recognized by a human in a similarity test respect to the original images which were utilized for the calculation of the average face. To explain results of the psychological experiments, eigenface spaces were constructed based on Principal Component Analysis (PCA). Significant correlation was found between human and PCA based computer recognition results. Emulation of human recognition of faces is one of the expected applications of this research.