%0 Journal Article
%T Fault-Tolerant Grid Architecture and Practice
%A Hai Jin
%A DeQing Zou
%A HanHua Chen
%A JianHua Sun
%A Song Wu
%A
金海
%A 邹德清
%A 陈汉华
%A 孙建华
%A 吴松
%J 计算机科学技术学报
%D 2003
%I
%X Grid computing emerges as effective technologies to couple geographically distributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globus fault detection service uses the well-known techniques based on unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in a grid system, and a convenient toolkit is also needed to maintain the consistency in the grid. A fault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus fault detection service is presented in this paper. The platform offers effective strategies in such three aspects as grid key components, user tasks, and high-level applications.
%K grid computing
%K fault tolerance
%K middleware
%K Globus
%K distributed computing
容错网格
%K 网格计算
%K 中间件
%K 分布式计算
%K FTGP
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=F57FEF5FAEE544283F43708D560ABF1B&aid=4ECBA81DC11244C3E7474406654A9D2C&yid=D43C4A19B2EE3C0A&vid=13553B2D12F347E8&iid=E158A972A605785F&sid=2B25C5E62F83A049&eid=2B25C5E62F83A049&journal_id=1000-9000&journal_name=计算机科学技术学报&referenced_num=10&reference_num=22