Hiroaki HIGAKI Terunao SONEOKA
This paper proposes a group-to-group communications algorithm that can extend the range of distributed systems where we can achieve active replication fault-tolerance to partner model distributed systems, in which all processes communicate with each other on an equal footing. Active replication approach, in which all replicated processes are active, can achieve fault-tolerance with low overhead because checkhpoint setting and rollback are not required for recovery from process failure. This algorithm guarantees that each replicated process in a process group has the same execution history and that communications between process groups keeps consistency even in the presence of process failure and message loss. The number of control messages that must be transmitted between processes for a communication between process groups is only a linear order of the number of replicated processes in each process group. Furthemore, this algorithm reduces the overhead for reconfiguration of a process group by keeping process failure and recovery information local to each process group.
Alberto PALACIOS PAWLOVSKY Makoto HANAWA
This paper describes a new method for the concurrent detection of faults in instruction level parallel (ILP) processors. This method uses the No OPeration (NOP) instruction slots that under branches, resource conflicts and some kind of data dependencies fill some of the pipelines (stages) in an ILP processor. NOPs are replaced by the copy of an effective instruction running in another pipeline. This allows the checking of the pipelines running the original instruction and its copy (ies), by the comparison of the outputs of their stages during the execution of the replicated instruction. We show some figures obtained for the application of this method to a two-pipeline superscalar processor.
Alberto Palacios PAWLOVSKY Makoto HANAWA Osamu NISHII Tadahiko NISHIMUKAI
Advances in semiconductor technology have made it possible to develop an experimental 1000 MIPS superscalar RISC processor. The high performance of this processor was obtained using architectural concepts such as multiple CPU configuration, superscalar microarchitecture, and high-speed device technology. This paper focuses on the novel features of this RISC processor, its device technology, architectural characteristics and one technology that has been devised to make its integer CPU cores fault-tolerant.
Runlength-limited block codes are investigated. These codes are useful for storing data in storage devices. Since most devices are not noiselss, the codes are often required to have some error-control capability. We consider runlength-limited codes that can correct or detect unidirectional byte errors. Some constructions of such codes are presented.