Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

download Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

If you can't read please download the document

Transcript of Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

Nenad StojniDatabases & Information Systems Group

Outline

Self-organizing properties in OSIRIS and current limitations

The Shepherd approach to fault-toleranceNovel migration algorithm

Shepherd ring: herds, shepherd pools, routing

Binding ring: Service lookup, late binding, load balancing

Summary

present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities

OSIRIS Open Service Infrastructure for Reliable and Integrated process Support

Decentralized P2P execution of processesWeb Service Invocation

Fault-tolerant, Self-* propertiesLate-binding & Load-balancing

Safe continuation-passing (2PC)

Pub/Sub Meta-data repositories

processes can be imagined as programsthat coordinate the invocation of distributed web services

Late binding of service in-stances, in conjunction with load balancing strategiesOffer alreadz self * properties

Transactional garantees. the system is completelyresilient to temporary node failures.

Also, thanks to late binding, permanent failures of nodes participating to the execution of a process instance, but not involved in a computation at the moment of failure, do not affect the execution.

OSIRIS-Process execution example

BC

A

Process Definition

1

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Process migration

BC

A

Process Definition

1

EService instancesOSIRIS LayerD5WhiteboardAthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Activity execution

BC

A

Process Definition

1

AEService instancesOSIRIS LayerD5

Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Late binding

BC

A

Process Definition

1

BA2

AEService instancesOSIRIS LayerD5

DB4

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Late binding

BC

A

Process Definition

1

BA2

AEService instancesOSIRIS LayerD5

Whiteboard

DB4

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Successor failure

BC

A

Process Definition

DC3

1

BA2

CE6

AEService instancesOSIRIS LayerD5

Whiteboard

DB4

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Successor failure

BC

A

Process Definition

DC3

1

BA2

CE6

AEService instancesOSIRIS LayerD5

Whiteboard

DB4

X

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Successor failure

BC

A

Process Definition

DC3

1

BA2

CE6

AEService instancesOSIRIS LayerD5

DB4

X

Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

Replacement node found (late binding)

OSIRIS-Migration failure

BC

A

Process Definition

DC3

1

BA2

CE6

AEService instancesOSIRIS LayerD5

DB4

2PC

WhiteboardX

the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Predecessor failure

BC

A

Process Definition

DC3

1

BA2

CE6

AEService instancesOSIRIS LayerD5

DB4

X

Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS

DC3

1

BA2

CE6

DB4

AEService instancesOSIRIS LayerD

BC

A

Process Definition5Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS-Current node failure

DC3

BA2

CE6

Whiteboard

DB4

AEService instancesOSIRIS LayerD

BCAProcess DefinitionX

1

5the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard

OSIRIS failure handling

Failure caseHandling

Successor failureLate-binding

Migration failure2PC abort

Predecessor failureNo handling necessary

Temporary node failureRecovery from local stable storage

Current node failureProcess execution stops/hangsState is lostNo notification

Hardware, network or service failures

If the node becomes temporarily disconnected from the network, the system is still able to recover.node will keep retrying to pass on the results until it succeeds

Works very well in controlled environnments

Outline

Self-organizing properties in OSIRIS and current limitations

The Shepherd approach to fault-toleranceNovel migration algorithm

Shepherd ring: herds, shepherd pools, routing

Binding ring: Service lookup, late binding, load balancing

Summary

present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities

Our solution: Shepherd

Shared Memory Layer

BA

OSIRIS Layer

Shepherd Layer

DC3

ED1

BA2

DB4

DA5

2

EC6

DC7Monitor

Read/Write

WN assigned to Shepherds (herds)

Shepherds organized in poolsLeader

Shepherds in the pool share state

Persistence of process state

Triggering of process activity

Shepherd Migration Algorithm

A

S1

K0

1Shepherd starts the activity

Picks a worker from the herd

Sends an activation key K0

Shepherd Migration Algorithm

AS1K0

Worker acknowledges supervision

Resends the activation key K0

Start of monitoring

2

Shepherd Migration Algorithm

AS1

Worker reads the whiteboard with the activation key K0

3

Shepherd Migration Algorithm

(K1,B)

AS1Worker finishes execution

Generates a new activation key K1

Determines the service type to continue the execution

4

Shepherd Migration Algorithm

AS1

Worker writes the whiteboard with the activation key K1

5

Shepherd Migration Algorithm

Wack

AS16Worker acknowledges write of whiteboard

Supervision ends

Shepherd Migration Algorithm

AS1S2

K1,B

7Shepherd migrates to another shepherd

Passes on the activation key K1 and following service type

Shepherd Migration Algorithm

(K1,B)

Wack

A

S1

BS2

CS3

K0

K1,B

K1

K2,C

K2

K2

135467

(K2,C)

Wack

(K3,...)

Wack

2K0

K1

K1

K2

Leader of a pool communicates to a WN an activation key KiUsing Ki, WN gets the porcess state form SMLWN writes the next process activity with a new key Ki+1 to SML WN sends the new activation key Ki+1 to the assigned pool of ShepherdsLeader of the pool forwards the activation key to another pool of shepherdsAnother step that deltes entires from the shm

Shepherd failure cases

Failure of worker nodes

Failure of shepherds

Failures in the shared memory

Failure of worker nodes

Replacement node from the herd

Same service type

Fail-safe services

BUT undo side effects on Shared Memory

Wack

S1A''S2

K0

35A'

K0

...

AX

Unique activation key provides indenpendance of process activities

Temporarily failed WNs that have been replaced are terminated

The side-eff
ects created by B that are notstored on the shared memory cannot be undone

Failure of shepherds

Shepherds organized in pools, state shared

WN speaks to the pool

Transactional writes consistency guaranteed

New leader learns current state from the pool

A

S1X

S2

Wack

...

DHT-like structured overlay

Paxos commit protocol

consistent information about the state of the activity it is supervising. Distributed transaction

DHT fault detection mechanism to elect anappropriate shepherd replacement replica

Failures in shared memory

Chord-based

Replicated transactional storageSuccessful writes persistent

failed read/write can be always retried

A

S1

X

Beernet DHT implementationwith respect to the migration algorithm, only a passive role

Shepherd ring

Used for:Worker node to shepherd assignment

Routing of messages from WN to shepherds

Pools construction

Based on Chord structured overlayIndentifier circle of Shepherd node IDs and Worker node Ids (Consistent hashing)

Efficient routing: Log(NSh)

Routing mechanism

Failure-detection mechanism

their state relative to the execution of the migration algorithm

We use it to are assigned nodes to the herd of a shepherdseveral shepherds coordinate to form a poolhow leader election within a pool proceeds

Communication between a worker node and a pool of shepherds

Shepherds are phisical nodes and Wns the reource to be storedWorker node ids in the circle lying inbetween 2 shepherd ids become the herd of the adjacent shepherd

Shepherd ring

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

WN

deliver(96.76.89.12,join())

Worker requests an assignment to a shepherd

Submits a join message to any known shepherd

If a shepherd leaves the ring the subsequent one takes over the herd

Shepherd ring

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

h(96.76.89.12) = ID17

IP16=98.x.x.x

deliver(96.76.89.12,join())

Shepherd hashes worker Id

Routs the join message another shepherd

Routing until the responsible is found

Shepherd ring

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

IP16=98.x.x.x

deliver(96.76.89.12,join())

IP17=96.76.89.12Worker joins the herd

Exchanges heartbeats with its shepherd

Shepherd pools

Symmetric replication strategy:Node ID congruence-modulo equivalence classes

Responsible for x knows entire class of x

Pool = all responsibles for a class

Transactional guaranteesPaxos consensus

Shepherd pools

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

Equivalence class: ID1, ID9, ID17

Congruence modulo: 8

Pool: S2, S3, S5

Pool size : 3

Shepherd pools

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

Equivalence class: ID1, ID9, ID17

Congruence modulo: 8

Pool: S2, S3, S5

Equivalence class: ID2, ID10, ID18

Pool: S2, S3, S5

Pool size : 3

Shepherd pools

S2

S5

S3

S4

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

Equivalence class: ID1, ID9, ID17

Congruence modulo: 8

Pool: S2, S3, S5

Equivalence class: ID2, ID10, ID18

Pool: S2, S3, S5

Equivalence class: ID3, ID11, ID19

Pool: S2, S3, S1

Pool size : 3

Late binding

Locate a shepherd providing service type TShepherd provides type T if it monitors instances of type T

Binding ringPhysical nodes & service types (resources)

Distributed multimap data structureService type List of shepherds

Binding ring

O3

O7

O5

O6

S8

O4

T1

T0

T2

T3

T4

T5

T6

T7

T8

T9

T10

T11

T12

T23

T21

T22

T19

T20

T17

T18

T15

T16

T13

T14

store(T,S5)

O1

O8

O2

rnd[1, Nfrag]? 2

Tfrag3

Storing shepherd S5 providing service type T

Query for number of fragments of type T

Binding ring

O3

O7

O5

O6

S8

O4

T1

T0

T2

T3

T4

T5

T6

T7

T8

T9

T10

T11

T12

T23

T21

T22

T19

T20

T17

T18

T15

T16

T13

T14

store(T,S5)

O1

O8

O2

Tfrag3S2S3Tfrag1S1

Tfrag2S4

Cfrag=3

Fragments of service type T in the ring

Each fragment is a multimap

Binding ring

O3

O7

O5

O6

S8

O4

T1

T0

T2

T3

T4

T6

T7

T8

T9

T10

T11

T12

T23

T21

T22

T19

T20

T17

T18

T15

T16

T13

T14

store(T,S5)

O1

O8

O2

Tfrag3S2S3Tfrag1S1

Tfrag2S4S5

rnd[1, Nfrag]? 2

storefrag(Tfrag2,S5)

T5

Random selection of fragment for storage

If storage is full, create a new fragment and add to it

Load balancing

Optimize performance

Extended binding ringShepherd average load

Publish/subscribe of load information

take into considerationother factors to improve porcess execution

as explained above is sufficient to guarantee the correctness of the routing and enableLate-binding.

Aggregate load

Load balancing

S3

S7

S5

ID1

ID0

ID2

ID3

ID4

ID5

ID6

ID7

ID8

ID9

ID10

ID11

ID12

ID23

ID21

ID22

ID19

ID20

ID17

ID18

ID15

ID16

ID13

ID14

S1

WN3 Load = 40%

WN5 Load = 60%

1122Shepherd ring

Worker nodes publish load to their shepherd

Load balancing

O3

O7

O5

O6

S8

O4

T1

T0

T2

T3

T4

T5

T6

T7

T8

T9

T10

T11

T12

T23

T21

T22

T19

T20

T17

T18

T15

T16

T13

T14

O1

O8

O2

S375%S260%Cfrag2

S170%Cfrag1S170%S455%Afrag1

Cfrag2Cbest

Binding ring

Avg. load of a shepherd for a service type

Avg. load lists sorted in fragments

Load balancing

O3

O7

O5

O6

S8

O4

ID9

T21

T22

ID15

O1

O8

O2

Cfrag1Cbest

Cbest = < Cfrag1, 50% >

T1

T0

T23

T2

T3

T4

T5

T6

T7

T8

T10

T11

T12

T13

T14

T19

T20

T17

T18

T16

S455%S150%Afrag1

S375%S260%Cfrag2

S150%Cfrag1

Start contest

Least loaded type fragment becomes the best fragment

Outline

Self-organizing properties in OSIRIS and current limitations

The Shepherd approach to fault-toleranceNovel migration algorithm

Shepherd ring: herds, shepherd pools, routing

Binding ring: Service lookup, late binding, load balancing

Summary

present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities

Summary

Shepherd: Improved self-* properties in OSIRIS

Novel completely decentralized architecture

Future Work:Implementation & Experimental evaluation

Extend to Stream-enabled services

Customize transactional protocols for efficiency

Economical cost-model (trade-off performance vs. robustness)

Thank you for your attention!

Questions ?

Click to edit the title text format

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso