Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal.
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal.
DISTRIBUTEDFAILURE DETECTOR
Ziv Dayan200388130
Tom Afek Kafka200637247
InstructorIttay Eyal
Reminder
What is a failure detector? Our failure detector
Software Implementation Gossip style Independent local unit
Model
Implementation Communication – by messages Each message contains a list of heartbeats Each heartbeat contains
IP of creator Time since creation
Each node contains its own Local Node: Local NodeLocal Node
Net MembersNet MembersNodeNode NodeNode NodeNode NodeNode NodeNode NodeNode
NeighborsNeighbors
VersionsVersions
NeighborNeighbor NeighborNeighbor NeighborNeighbor NeighborNeighbor
VersionVersion VersionVersion VersionVersion VersionVersion VersionVersion
Network Construction
Failure Detection Method
Repeat periodically: Choose the node whose threshold is
closest to expiration Wait until the threshold has expired Check the local time of creation of the
last heartbeat received by the suspected node: If changed – the node is OK Else – the suspected node had crashed
Thread Diagram
Computer ListenerComputer Listener
MainMain
Message HandlerMessage Handler
Message SenderMessage Sender
SenderSender
DetectorDetector
Version Handling A new abstract class is added –
NetMessage Method 1: Handle() – decodes the received
message using the proper version and returns Message
Method 2: toString() – used for serializationNetMessage
SHA1Message NormalMessage
Message
Version Agreement Protocol
initiator responder
,i iaddr V
,i rv
,i rv
NetMsg msg
,i rvNetMsg msg
Readers Writers Problem
Heartbeat Rate
H = f(P, n, threshold) Assumptions required
Simplicity Vs Efficiency Full topology Spread time << threshold
Heartbeat Rate – Take I
Assumption – Local Information Strong Assumption
Reliability x – number of messages - Probability for false detection We want
Result :
21
1 1
xn PLR
Pn n
thresht
h2
1 1
xn PLR
n n
2
1 1
log 1thresh
n PLR
n n
th
P
Take I Performance
Linear Performance The bigger is P the bigger is the slope
Heartbeat Rate – Take II
Assumptions Synchrony Consistency
Calculation for average case
Take II Performance
High Performance
Which Method Is better?
Comparison Categories Efficiency Scalability Dynamism Reliability
Thank you for listening