1-7hit |
Because it has desirable features such as no cascading rollback, fast output commit and asynchronous logging, causal message logging needs a consistent recovery algorithm to tolerate concurrent failures. For this purpose, Elnozahy proposed a centralized recovery algorithm to have two practical benefits, i.e. reducing the number of stable storage accesses and imposing no restriction on the execution of live processes during recovery. However, the algorithm with independent checkpointing may force the system to be in an inconsistent state when processes fail concurrently. In this paper, we identify these inconsistent cases and then present a recovery algorithm to have the two benefits and ensure the system consistency when integrated with any kind of checkpointing protocol. Also, our algorithm requires no additional message compared with Elnozahy's algorithm.
In this paper, we present a hybrid message logging protocol consisting of three modules for two-level hierarchical and distributed architectures to address the drawbacks of sender-based message logging. The first module reduces the number of in-group control messages and, the rest, the number of inter-group control messages while localizing recovery. In addition, it can distribute the load of logging and keeping inter-group messages to group members as evenly as possible. The simulation results show the proposed protocol considerably outperforms the traditional protocol in terms of message logging overhead and scalability.
Sender-based message logging (SBML) with checkpointing has its well-known beneficial feature, lowering highly failure-free overhead of synchronous logging with volatile logging at sender's memory. This feature encourages it to be applied into many distributed systems as a low-cost transparent rollback recovery technique. However, the original SBML recovery algorithm may no longer be progressing in some transient communication error cases. This paper proposes a consistent recovery algorithm to solve this problem by piggybacking small log information for unstable messages received on each acknowledgement message for returning the receive sequence number assigned to a message by its receiver. Our algorithm also enables all messages scheduled to be sent, but delayed because of some preceding unstable messages to be actually transmitted out much earlier than the existing ones.
ChaYoung KIM JinHo AHN ChongSun HWANG
Gossip-based reliable broadcast protocols with reasonably weak reliability properties scale well to large groups and degrade system performance gracefully even if node failure or message loss rates increase compared with traditional protocols. However, although many distributed applications require highly steady performance only by allowing causality to be used asynchronously, there is no existing gossip-based protocol offering causally ordered delivery property more lightweight than totally ordered delivery one. This paper presents an application-level broadcast algorithm to guarantee causally-ordered delivery semantics based on peer to peer interaction models for scalability, reasonable reliability and stable throughput. Processes propagate each message with a vector time stamp much like the spread of rumor in society for a fixed number of rounds. Upon receipt of these messages, correct processes immediately deliver the corresponding messages to the application layers in a causal order. Simulation results show that the proposed algorithm outperforms the existing ones in terms of delivery throughput.
This paper presents a new scalable method to considerably reduce the rollback propagation effect of the conventional optimistic message logging by utilizing positive features of reliable FIFO group communication links. To satisfy this goal, the proposed method forces group members to replicate different receive sequence numbers (RSNs), which they assigned for each identical message to their group respectively, into their volatile memories. As the degree of redundancy of RSNs increases, the possibility of local recovery for each crashed process may significantly be higher. Experimental results show that our method can outperform the previous one in terms of the rollback distance of non-faulty processes with a little normal time overhead.
All the existing sender-based message logging (SBML) protocols share a well-known limitation that they cannot tolerate concurrent failures. In this paper, we analyze the cause for this limitation in a unicast network environment, and present an enhanced SBML protocol to overcome this shortcoming while preserving the strengths of SBML. When the processes on different nodes execute a distributed application together in a broadcast network, this new protocol replicates the log information of each message to volatile storages of other processes within the same broadcast network. It may reduce the communication overhead for the log replication by taking advantage of the broadcast nature of the network. Simulation results show our protocol performs better than the traditional one modified to tolerate concurrent failures in terms of failure-free execution time regardless of distributed application communication pattern.
The previous communication-induced checkpointing may considerably induce worthless forced checkpoints because each process receiving messages cannot obtain sufficient information related to non-causal Z-paths. This paper presents an enhanced sender-based message logging protocol applicable to any communication-induced checkpointing to lead to a high decrease of the forced checkpointing overhead of communication-induced checkpointing in an effective way while permitting no useless checkpoint. The protocol allows each process sending a message to know the exact timestamp of the receiver of the message in its logging procedures without any extra message. Simulation verifies their great efficiency of overhead alleviation regardless of communication patterns.