Authors: Lyu, M.R., Xinyu Chen, and Tsz Yeung Wong From : Intelligent Systems, IEEE [see also IEEE...
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of Authors: Lyu, M.R., Xinyu Chen, and Tsz Yeung Wong From : Intelligent Systems, IEEE [see also IEEE...
Authors: Lyu, M.R., Xinyu Chen, and Tsz Yeung Wong From : Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications vol.19,Issue 5,Sep- Oct 2004 Date : 2007/5/24Teacher : Jong-Shin ChenStudent number: 9530618Name : 施宏達
Design and evaluation of a fault-tolerant mobile-agent system
OutlineIntroduction
System architecture and protocol design
Agent failure detection and recovery
Failures of witness agents and the recovery strategy
Simplifying the witnessing dependency
An example model
Experimental results
Conclusion
IntroductionMobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and
continually changing computer networks.
Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems.
System architecture and protocol design(1/5)
The agent server should provide three types of stable storage—for logs, checkpoints, and messages.
Each agent contains its internal data, which could also be lost due to the failure.
If the agent renews its computation from the starting point of its itinerary, it will violate the exactly-once property.
An actual agent is a common mobile agent that performs specific computations for its owner.
Witness agents monitor the actual agent and detect whether it’s lost.
System architecture and protocol design(2/5)
System architecture and protocol design(3/5)
(1)Log entry logiarrive (2)Send message msgi
arrive to serverSi-1
(3)After computation,checkpoint the data (4)Log entry logileave
(5)Send message msgileave to serverSi-1 (6)Spawn a witness agent
Witness agent wi–1 is more passive than the actual agent in this protocol.
Two messages are expected: msgi
arrive and msgileave.
After receiving these two messages,wi–1 waits for the direct heartbeat message, msgi
alive, which the witness agent at server Si sends.
System architecture and protocol design(4/5)
Agent failure detection and recovery(1/3)wi–1 fails to receive msgi
arrive:
1. The message is lost due to an unreliable network.2. The message arrives after the timeout period of wi–1.3. Actual agent α gets lost when it’s ready to leave Si–1
and is heading for Si.4. Actual agent α gets lost when it arrives at Si without
logging.5. Actual agent α gets lost when it arrives at Si with
logging.
Agent failure detection and recovery(2/3)
(1)The witness agent spawns a probe, which travels to Si. (2)The probe carries the checkpointed data.(3) The probe inspects the log in Si. (4) If logiarrive is found, the probe retransmits msgiarrive toSi–1.(5) If not, it recovers the agent from the checkpointed data.
Agent failure detection and recovery(3/3)
wi–1 fails to receive msgileave:
1. The message is lost due to an unreliable network.2. The message arrives after the timeout
period of wi–1.3. Actual agent α gets lost just after sending
message msgiarrive.
4. Actual agent α gets lost just after logging entry logi
leave.5. Actual agent α gets lost after spawning witness agent wi.
Failures of witness agents and the recovery strategy(1/2)
w0→w1→w2→… wi–1→ wi →α
Failures of witness agents:1. The network is congested or unreliable.2. The system load of Si is too high.3.Witness agent wi was not created or is lost.
Failures of witness agents and the recovery strategy(2/2)
(1)A failure strikes Si–1, and the witnessing dependency is broken. (2) A failure strikes Si, and the actual agent is terminated.(3) The witness agent at Si–2 recovers the witness agent at Si–1. (4) The witness agent at Si–1 recovers the actual agent at Si.
Simplifying the witnessing dependency(1/2)The actual agent creates witness agents
along its itinerary, and the witness agents exchange heartbeat messages.
These procedures consume considerable resources.
No more than k servers can fail at the same time,we can simplify our mechanism by shortening.
Simplifying the witnessing dependency(2/2)
If (i ≤ k) 0→w1→… wi–1→α Else wi–k→ wi–k+1→…→ wi–1→α
Finally, when α successfully logs entry logi+1
arrive, the system can terminate wi–k by sending message msgi+1
kill from Si+1 to Si–k.
Experimental results
Network transmission rate: 100 for agents,200 for messages
Server repair rate, t_s_r: 0.1All message log rates: 100Arrival, leave, and heartbeat message bound times: 1, 100, and 20, respectivelyHeartbeat interval: 5
Experimental results(Agent survivability a)
server failure rate is 0.001
job completion rate is 0.01
Experimental results(Agent survivability b)
server failure rate is 0.005
job completion rate is 0.01
Experimental results(Agent survivability c)
server failure rate is 0.005
job completion rate is 0.05