Masato KITAKAMI Bochuan CAI Hideo ITO
The cost of checkpointing consists of checkpoint overhead and checkpoint latency. The former is the time to stop the process for checkpointing. The latter is the time to complete the checkpointing including background checkpointing which stores memory pages. The large checkpoint latency increases the possibility that the error occurs in background checkpointing, which leads to long rollback distance. The method for small checkpoint latency has not been proposed yet. This paper proposes a checkpointing method which achieves small checkpoint latency. The proposed method divides a checkpoint interval into several subcheckpoint intervals. By using the history of memory page modification in subcheckpoint intervals, the proposed method saves some pages which are not expected to be modified in the rest of checkpoint interval in advance. Computer simulation says that the proposed method can reduce the checkpoint latency by 25% comparing to the existing methods.
Fault-tolerance is an important design issue in building a reliable mobile computing system. This paper considers checkpointing recovery services for a mobile computing system based on the ad-hoc network environment. Since potential problems of this new environment are insufficient power and limited storage capacity, the proposed scheme tries to reduce disk access frequency for saving recovery information, and also the amount of information saved for recovery. A brief simulation study has been performed and the results show that the proposed scheme takes advantage of the existing checkpointing recovery schemes.
Ann-Chen CHANG Chun HSU Ing-Jiunn SU
This letter deals with adaptive array beamforming based on a minimum variance distortionless response (MVDR) technique with robust capabilities for code-division multiple access signals. It has been shown that the MVDR beamformer suffers from the drawback of being very sensitive to pointing error over the eigenspace-based beamformers. For the purpose of efficient estimation and calibration, a highly efficient approach has been proposed that is implemented on polynomial rooting rather than spectral searching. However, this rooting method is suboptimal in the presence of the noise and multiple access interference (MAI). In this letter, we propose an improved polynomial rooting calibration method that is robust in both of the low signal-to-noise ratio and large MAI scenarios. Several computer simulations are provided for illustrating the effectiveness of the proposed method.
In human-computer interaction, Fitts' law has been applied in one-dimensional pointing task evaluation for some decades, and the usage of effective target width (We) in Fitts' law has been accepted as an international standard in ISO standards 9241-9 [4]. However, the discussion on the concrete methods for calculating We has not been developed comprehensively nor have the different methods of calculation been integrated. Therefore, this paper focuses on a detailed description and a comparison of the two main We calculation methods. One method is mapping all the abscissa data in one united relative coordinate system to perform the calculation (called CC method) and the other is dividing the data into two groups and mapping them in two separate coordinate systems (called SC method). We tested the accuracy of each method and compared both methods in a highly controlled experiment. The experiments' results and data analysis show that the CC method is better than the SC method for human computer interface modeling. These results will be instrumental for future application of Fitts' law.
Jing KONG Xiangshi REN Keizo SHINOMORI
Fitts' law has been applied in many studies to evaluate pointing tasks. However, the quantitative effect of using color in the interfaces has not been discussed in the literature. This paper introduces research on the effects of color in pointing tasks using Fitts' law as the evaluation method. Different colors and color presentation styles are applied in the experiments which are similar in design to the paradigmatic Fitts' law pointing task. The experimental results show that when the subjects use a mouse as the input device, there is no significant difference between an interface with a colored target and one with a white target in the mean performance time. The results also reveal that color presentation styles will offer no significant difference to pointing tasks when the mouse is applied. However, when the interface of tablet PC and pen was applied, subjects without much experience in tablet personal computer usage needed more time to perform the task with colored targets than with a white target. Furthermore, when the colors are changed randomly during the selection process, the difference is even more obvious. These results are confirmed by a Checking Experiment and a Learning Effect Experiment which we performed on different groups of subjects.
Expressions are presented for the probability of target detection and the measurement accuracy of the detection, taking into account the effects of antenna beam-pointing error. Evaluation of these expressions requires numerical integration, which is computationally expensive. Approximate but analytic and efficient expressions are also presented. Numerical examples are given to present the relative accuracy of our analytic approximations.
In Pyo HONG Byung In MOON Yong Surk LEE
The latest processors employ a large instruction window and longer pipelines to achieve higher performance. Although current branch predictors show high accuracy, the misprediction penalty is getting larger in proportion to the number of pipeline stages and pipeline width. This negative effect also happens in case of exceptions or interrupts. Therefore, it is important to recover processor state quickly and restart processing immediately. In this letter, we propose a low-cost recovery mechanism for processors with large instruction windows.
Gabriel RODRIGUEZ María J. MARTIN Patricia GONZALEZ Juan TOURIÑO
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.
Because it has desirable features such as no cascading rollback, fast output commit and asynchronous logging, causal message logging needs a consistent recovery algorithm to tolerate concurrent failures. For this purpose, Elnozahy proposed a centralized recovery algorithm to have two practical benefits, i.e. reducing the number of stable storage accesses and imposing no restriction on the execution of live processes during recovery. However, the algorithm with independent checkpointing may force the system to be in an inconsistent state when processes fail concurrently. In this paper, we identify these inconsistent cases and then present a recovery algorithm to have the two benefits and ensure the system consistency when integrated with any kind of checkpointing protocol. Also, our algorithm requires no additional message compared with Elnozahy's algorithm.
Determining consistent global checkpoints of a distributed computation has applications in the areas such as rollback recovery, distributed debugging, output commit and others. Netzer and Xu introduced the notion of zigzag paths and presented necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint. This result also reveals that determining the existence of zigzag paths between checkpoints is crucial for determining consistent global checkpoints. Recent research also reveals that determining zigzag paths on-line is not possible. In this paper, we present an off-line method for determining the existence of zigzag paths between checkpoints.
Masaaki KONDO Takuro HAYASHIDA Masashi IMAI Hiroshi NAKAMURA Takashi NANYA Atsushi HORI
Cluster systems are getting widely used because of good performance / cost ratio. However, their reliability has not been well discussed in practical environment so far. As the number of commodity components in a cluster system gets increased, it is indispensable to support reliability by system software. SCore cluster system software is a parallel programming environment for High Performance Computing (HPC). SCore provides checkpointing and rollback-recovery mechanism for high availability. In this paper, we analyze and evaluate the checkpointing and rollback-recovery mechanisms of SCore quantitively. The experimental results reveal that the required time for checkpointing scales very well in respect to the number of computing nodes. However, the required time is quite long due to the low effective network bandwidth. Based on the results, we modify SCore and successfully make checkpointing and recovery 1.8 2.8 times and 3.7 5.0 times faster respectively. This is very helpful for cluster systems to achieve high performance and high availability.
Fault-tolerant execution of a mobile agent is an important design issue to build a reliable mobile agent system. Several fault-tolerant schemes for a single agent system have been proposed, however, there has been little research result on the multi-agent system. For the cooperating mobile agents, fault-tolerant schemes should consider the inter-agent dependency as well as the mobility; and try to localize the effect of a failure. In this paper, we investigate properties of inter-agent dependency and agent mobility; and then characterize rollback propagation caused by the dependency and the mobility. We then suggest some schemes to localize rollback propagation.
Jiman HONG Sangsu KIM Yookun CHO
Forked checkpointing scheme is proposed to achieve low checkpoint overhead. When a process wants to take a checkpoint in the forked checkpointing scheme, it creates a child process and continues its normal computation. Two recovery models can be used for forked checkpointing when the parent process fails before the child process establishes the checkpoint. One is the pessimistic recovery model where the recovery process rolls back to the previous checkpoint state. The other is the optimistic recovery model where a recovery process waits for the checkpoint to be established by the child process. In this paper, we present the recovery models for forked checkpointing by deriving the expected execution time of a process with and without checkpointing and also show that the expected recovery time of the optimistic recovery model is smaller than that of the pessimistic recovery model.
Kyue-Sup BYUN Sung-Hwa LIM Jai-Hoon KIM
This paper presents a two-tier coordinated checkpointing algorithm which can reduce the number of messages by being composed of two levels in mobile computing. Thus mobile devices have a high mobility and are lack of resources (e.g., storage, bandwidth, and battery power), traditional distributed algorithms like coordinated checkpointing algorithms could not be applied properly in mobile environment. In our proposed two-tier coordinated checkpointing algorithm, the messages to be transferred are requested by the mobile hosts and are handled by the appropriate MSS's (Mobile Support Stations). And the broadcast messages are handled by MSS instead of relaying the messages to all the mobile hosts directly as with the previous algorithms. This can reduce the communication cost and maintain the overall system consistency. In wireless cellular network, mobile computing based on a two-tier coordinated checkpointing algorithm reduces the number of synchronization messages. We perform performance comparisons by parametric analysis to show that a two-tier coordinated checkpointing algorithm can reduce communication cost compared to the previous algorithms in which the messages are directly sent to the mobile hosts.
Using eigenstructure approach to form interference canceler is very sensitive to pointing error, especially when the interference number is overestimated. This Letter presents an effective technique to correct the pointing error by the projection matrix of noise subspace. Based on the corrected steering angle, a proper blocking matrix of the eigenstructure interference canceler can be obtained to suppress the leakage of desired signal. Therefore, signal cancellation does not occur, even the interference number is overestimated in constructing the interference subspace.
Hyochang NAM Jong KIM Sung Je HONG Sunggu LEE
For checkpointing to be practical, it has to introduce low overhead for the targeted application. As a means of reducing the overhead of checkpointing, this paper proposes a probabilistic checkpointing method, which uses block encoding to detect the modified memory area between two consecutive checkpoints. Since the proposed technique uses block encoding to detect the modified area, the possibility of aliasing exists in encoded words. However, this paper shows that the aliasing probability is near zero when an 8-byte encoded word is used. The performance of the proposed technique is analyzed and measured by using experiments. An analytic model which predicts the checkpointing overhead is first constructed. By using this model, the block size that produces the best performance for a given target program is estimated. In most cases, medium block sizes, i.e., 128 or 256 bytes, show the best performance. The proposed technique has also been implemented on Unix based systems, and its performance has been measured in real environments. According to the experimental results, the proposed technique reduces the overhead by 11.7% in the best case and increases the overhead by 0.5% in the worst case in comparison with page-based incremental checkpointing.
Eigenstructure-based beamformers suffer form performance degradation due to pointing errors when the number of the incident signals is incorrectly detected or when the desired signal is much stronger than the interferences. We present a robust beamformer with the self-correction of look direction errors, based on the Newton method. Even though there are errors in the detection of the incident signal number as well as in the presumed look direction, it can achieve optimum performance with no errors.
Distributed domino effect-free checkpointing techniques can be divided into two categories: coordinated and communication-induced checkpointing. The former is inappropriate for mobile computing systems because it either forces every mobile host to take a new checkpoint or blocks the underlying computation during the checkpointing process. The latter makes every mobile host take the checkpoint independently. However, each mobile host may need to store multiple local checkpoints in stable storage. This investigation presents a novel three level synchronous checkpointing algorithm that combines the advantages of above two methods for mobile computing systems. The algorithm utilizes pre-synchronization, quasi-synchronization, and post-synchronization techniques and has the following merits: (1) Consistent global checkpoints can be ensured. (2) No mobile host is blocked during checkpointing. (3) Only twice the checkpoint size is required. (4) Power consumption is low. (5) The disconnection problem of mobile hosts can be resolved. (6) Very few mobile hosts in doze mode are disturbed. (7) It is simple and easy to implement. The proposed algorithm's numerical results are also provided in this work for comparison. The comparison reveals that our algorithm outperforms other algorithms in terms of checkpoint overhead, maintained checkpoints, power consumption, and disturbed mobile hosts.
This work presents two novel algorithms to prevent rollback propagation for independent checkpointing: an efficient adaptive independent checkpointing algorithm and an optimized adaptive independent checkpointing algorithm. The last opportunity strategy that yields a better performance than the conservation strategy is also employed to prevent useless checkpoints for both causal rewinding paths and non-causal rewinding paths. The two methods proposed herein are domino effect-free and require only a limited amount of control information. They also take less unnecessary adaptive checkpoints than other algorithms. Furthermore, experimental results indicate that the checkpoint overhead of our techniques is lower than that of the coordinated checkpointing and domino effect-free algorithms for service-providing applications.
Tadashi DOHI Kouji NOMURA Naoto KAIO Shunji OSAKI
This paper considers two simulation models for simple unreliable file systems with checkpointing and rollback recovery. In Model 1, the checkpoint is generated at a pre-specified time and the information on the main memory since the last checkpoint is back-uped in a secondary medium. On the other hand, in Model 2, the checkpointing is executed at the time when the number of transactions completed for processing is achieved at a pre-determined level. However, it is difficult to treat such models analytically without employing any approximation method, if queueing effects related with arrival and processing of transactions can not be ignored. We apply the generalized stochastic Petri net (GSPN) to represent the stochastic behaviour of systems under two checkpointing schemes. Throughout GSPN simulation, we evaluate quantitatively the maintainability of checkpoint models under consideration and examine the dependence of model parameters in the optimal checkpoint policies and their associated system availabilities.