|
Journal of Computers 2011
A Novel Failure Detection Algorithm for Reliable Distributed SystemsDOI: 10.4304/jcp.6.10.2013-2020 Keywords: failure detection , distributed system , quality of service Abstract: A failure detection service is perfect if it eventually detects all failures and every detection correctly identifies a failure that has occurred. Such a perfect failure detection service serves as a basic building block for many reliable distributed systems, for example in distributed lock services. In this paper, we introduce a perfect failure detection scheme in order to improve the fault tolerance of the service. We provide the precise system model and specification for a failure detection service. We present two novel algorithms that implement the failure detection service. We further develop a set of quality-of-service (QoS) metrics for perfect failure detection services, and apply probabilistic analysis to quantify the QoS metrics of the two algorithms.
|