Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal.

Post on 19-Dec-2015

215 views 0 download

Transcript of Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal.

DISTRIBUTEDFAILURE DETECTOR

Ziv Dayan200388130

Tom Afek Kafka200637247

InstructorIttay Eyal

Reminder

What is a failure detector? Our failure detector

Software Implementation Gossip style Independent local unit

Model

Implementation Communication – by messages Each message contains a list of heartbeats Each heartbeat contains

IP of creator Time since creation

Each node contains its own Local Node: Local NodeLocal Node

Net MembersNet MembersNodeNode NodeNode NodeNode NodeNode NodeNode NodeNode

NeighborsNeighbors

VersionsVersions

NeighborNeighbor NeighborNeighbor NeighborNeighbor NeighborNeighbor

VersionVersion VersionVersion VersionVersion VersionVersion VersionVersion

Network Construction

Failure Detection Method

Repeat periodically: Choose the node whose threshold is

closest to expiration Wait until the threshold has expired Check the local time of creation of the

last heartbeat received by the suspected node: If changed – the node is OK Else – the suspected node had crashed

Thread Diagram

Computer ListenerComputer Listener

MainMain

Message HandlerMessage Handler

Message SenderMessage Sender

SenderSender

DetectorDetector

Version Handling A new abstract class is added –

NetMessage Method 1: Handle() – decodes the received

message using the proper version and returns Message

Method 2: toString() – used for serializationNetMessage

SHA1Message NormalMessage

Message

Version Agreement Protocol

initiator responder

,i iaddr V

,i rv

,i rv

NetMsg msg

,i rvNetMsg msg

Readers Writers Problem

Heartbeat Rate

H = f(P, n, threshold) Assumptions required

Simplicity Vs Efficiency Full topology Spread time << threshold

Heartbeat Rate – Take I

Assumption – Local Information Strong Assumption

Reliability x – number of messages - Probability for false detection We want

Result :

21

1 1

xn PLR

Pn n

thresht

h2

1 1

xn PLR

n n

2

1 1

log 1thresh

n PLR

n n

th

P

Take I Performance

Linear Performance The bigger is P the bigger is the slope

Heartbeat Rate – Take II

Assumptions Synchrony Consistency

Calculation for average case

Take II Performance

High Performance

Which Method Is better?

Comparison Categories Efficiency Scalability Dynamism Reliability

Thank you for listening