1-1hit |
Hiroshi YOSHIDA Hiroyuki SUZUKI Kotaro OKAZAKI
In developing the SXO operating system for the SURE SYSTEM 2000 continuous operation system, we aimed to create an unprecedentedly high software and hardware fault tolerance. We devised a fault tolerant architecture and various methodologies to ensure fault tolerance. We implemented these techniques systematically throughout operating system development. In the design stage, we developed a design methodology called the recovery process chart to verify that recovery mechanisms were complete. In the manufacturing stage, we applied the concept of critical routes to recovery and other processes essential to high dependability. We also developed a method of finding critical routes in a recovery process chart. In the test stage, we added an artificial software fault injection mechanism to the operating system. It generates various reproducible errors at appropriate times and reduces the number of personnel needed for test, making system reliability evaluation easy.