FRASystem: fault tolerant system using agents in distributed computing systems |
| |
Authors: | HwaMin Lee DooSoon Park HeonChang Yu Giyeol Lee |
| |
Institution: | 1.Division of Computer Science and Engineering,Soonchunhyang University,Asan-si,Korea;2.Dept. of Computer Science Education,Korea University,Seoul,Korea;3.Research and Development Center,Saman Corporation,Anyang,Korea |
| |
Abstract: | In this paper, we present a fault tolerant and recovery system called FRASystem (Fault Tolerant & Recovery Agent System) using
multi-agent in distributed computing systems. Previous rollback-recovery protocols were dependent on an inherent communication
and an underlying operating system, which caused a decline of computing performance. We propose a rollback-recovery protocol
that works independently on an operating system and leads to an increasing portability and extensibility. We define four types
of agents: (1) a recovery agent performs a rollback-recovery protocol after a failure, (2) an information agent constructs
domain knowledge as a rule of fault tolerance and information during a failure-free operation, (3) a facilitator agent controls
the communication between agents, (4) a garbage collection agent performs garbage collection of the useless fault tolerance
information. Since agent failures may lead to inconsistent states of a system and a domino effect, we propose an agent recovery
algorithm. A garbage collection protocol addresses the performance degradation caused by the increment of saved fault tolerance
information in a stable storage. We implemented a prototype of FRASystem using Java and CORBA and experimented the proposed
rollback-recovery protocol. The simulations results indicate that the performance of our protocol is better than previous
rollback-recovery protocols which use independent checkpointing and pessimistic message logging without using agents. Our
contributions are as follows: (1) this is the first rollback-recovery protocol using agents, (2) FRASystem is not dependent
on an operating system, and (3) FRASystem provides a portability and extensibility. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|