Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

38
Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    2

Transcript of Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Page 1: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Checkpointing 2.0

Compiler-Assisted Checkpointing

Uncoordinated Checkpointing

Page 2: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Compiler-Assisted Checkpointing

Compiler-Assisted Checkpointing (1994) Beck, Plank, Kingsley (UTK)

Compiler-Assisted Memory Exclusion for Fast Checkpointing (1995) Beck, Plank, Kingsley (UTK)

Page 3: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Motivation We saw that memory exclusion can

dramatically reduce the size (and overhead) of a checkpoint file

Can be time consuming, and wrong decisions can cause a program to be incorrect on recovery

Use a compiler to determine what memory can be excluded, automating and ensuring the correctness of the process

Page 4: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Compiler Directives Programmer add the following

directives to the code CHECKPOINT_HERE

Direct translation to checkpoint_here()

EXCLUDE_HERE Include exclude_byte() and

include_byte() calls at that location

Page 5: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Directive Placement Poor placement of directives might

lead to inefficient checkpointing However, the program will still

checkpoint and recover properly

Page 6: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Directive Placement Why not place EXCLUDE_HERE

directly before all CHECKPOINT_HERE directives?

for(…) {

EXC…

CHE…

}

EXC…

for(…) {

CHE…

}

Page 7: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Overview of Technique Perform some data flow analysis of

the program to determine which variables are clean or dead at each EXCLUDE_HERE statement

Insert the appropriate exclude_byte() calls at each location

Page 8: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Build a Control Flow Graph

A control flow graph G=<N,E> is a directed graph, where each node represents a program statement, and each edge represents a possible flow of control from one statement to another

Page 9: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example ProgramINTEGER I, X, Y, Z

S1: Z = 3S2: X = 5S3: FOR 100, I = 1,1000S4: Y = X + ZS5: X = X * YS6: EXCLUDE_HERES7: CHECKPOINT_HERES8: 100 CONTINUES9: END

Page 10: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example CFG

S4

S9 S8

S7 S6

S1 S2 S3

S5

Page 11: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Find Sub-graphs Given the CFG, G, of our program,

find all sub-graphs, G’, where G’ is rooted by an EXCLUDE_HERE and contains all paths reachable from that EXCLUDE_HERE that do not pass through another EXCLUDE_HERE

Page 12: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example G’

S4

S9 S8

S7 S6

S1 S2 S3

S5

Page 13: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Strategy For each G’, calculate two sets

DE(G’) – all variables that are dead at every CHECKPOINT_HERE in G’

RO(G’) – all variables that are read-only throughout G’

At each EXCLUDE_HERE insert calls to exclude_bytes(v, CKPT_DEAD) for all v in DE(G’) exclude_bytes(v, READ_ONLY) for all v in RO(G’) include_bytes(v) for all v that are not in DE(G’)

nor in RO(G’)

Page 14: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Determine Memory Accesses

For each statement, S, determine the membership of three sets MAY_REF(S) – every location that may

be referenced by some execution of S MAY_DEF(S) – every location that may

be defined by some execution of S MUST_DEF(S) – every location that will

be defined by every execution of S

Page 15: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

An Aside Because our example has no

arrays, no pointers, etc. MUST_DEF(S) and MAY_DEF(S) will be the same set

Page 16: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example

S4

S9 S8

S7 S6

S1 S2 S3

S5

{},{Z} {},{X}

{},{I}

{X,Z},{Y}

{X,Y},{X}

{},{}{},{}

{},{}

{I},{I}

{REF},{DEF}

Page 17: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Liveness/Deadness

v is ‘live’ at S if there is a path from S to some S’ s.t. v MAY_REF(S’) and for all S’’ on the path, v MUST_DEF(S’’) v must be live at S if it is read at some

(later) S’ without being re-defined somewhere between the two

If v is not alive at S, we say it is dead at S

Page 18: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

DEAD(S) The set DEAD(S) is the set of variables

that are dead immediately before the execution of S v DEAD(S) if v is dead everywhere below S

or it is redefined at S, except if ref’d at S We calculate DEAD(S) with an iterative

algorithm:1. For all S, set DEAD(S) = V2. For every statement S,

DEAD(S) = (S’ DEAD(S’)) MUST_DEF(S) – MAY_REF(S)3. Repeat step 2 until all DEAD(S) converge

Page 19: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Data Flow Eqn. For DEAD DEAD(S) Fs(X) = { V if S is END { X MUST_DEF(S) – MAY_REF(S) otherwise,

where X is S’ DEAD(S’)

Page 20: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

The Set DE(S) The set DE(S) is the set of all

variables that are dead at every CHECKPOINT_HERE below S, in the same subgraph

Calculate iteratively, as beforeFs(X) = { X DEAD(S) if S is CHECKPOINT_HERE *{ V if S is EXCLUDE_HERE or

END { X otherwise

where X is S’ DE(S’)

Page 21: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

The Set RO(S) v is read-only at S if v MAY_DEF(s) The set RO(S) is the set of variables

that are read-only along all paths from S in the same sub-graph

Fs(X) = { V if S is EXCLUDE_HERE or END

{ X – MAY_DEF(S) otherwise

Page 22: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Solution to Example DE(G’) and RO(G’) are defined to be DE(S)

and RO(S) where S is the statement directly following the EXCLUDE_HERE

For our example DE(7) = {Y}, RO(7) = {Z} S6 would become

exclude_bytes(Y, CKPT_DEAD)exclude_bytes(Z, CKPT_READONLY)include_bytes(everything else)

Page 23: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Uncoordinated Distributed Checkpoints

Q: How can we extend our uniprocessor checkpointing to a distributed system?

A: Each process in the distributed system takes an independent checkpoint

Page 24: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Global State The global state is a collection of

the states of each of the individual processes (and of the communication channels)

A consistent global state is one which that may occur during a failure-free, correct running of the computation

Page 25: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Consistent States are states that may have occurred

q

p

q

p

Page 26: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Inconsistent States are states that could not have occurred

Here processor p has received a message that has not been sent

q

p

Page 27: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Inconsistent States Inconsistent states can only occur

where there have been failures, and the processes have been restarted from their checkpoints

A rollback-recovery system must insure that the system is restarted in a consistent state but not necessarily a state that has

ever occurred

Page 28: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Consistent Global Checkpoint

A consistent global checkpoint is set of checkpoints, one from each process, that correspond to a consistent global state

If processes take their checkpoints independently, they must search for a consistent global state upon restart

Page 29: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

The Domino Effect In the event of a failure, ideally we

would like to only roll back the failed process; however, doing so might leave the system in an inconsistent state, necessitating that others be rolled back as well

Page 30: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example

If r fails, and restarts at C, message 8 must be invalidated, forcing q to rollback to B. Msg. 7 is now invalidated, forcing p to rollback to A, etc., all the way to the beginning

p

q

r *

1

2

34

5

67

8

C

B

A

Page 31: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Calculating the Recovery Line

The recovery line for an uncoordinated system is the set of the “latest” checkpoints for each process in the system that is consistent

In order to calculate the RL after a failure, the processes record the dependencies among their checkpoints during failure-free operation

Page 32: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Protocol Let ci,x be the xth checkpoint of process Pi

Let Ii,x denote the interval between checkpoints ci-

1,x and ci,x If Pi sends a message, m, to Pj during interval Ii,x, Pi

will piggy-back (i,x) on m If Pj receives m during Ij,y, it will record the

dependence of Ij,y on Ii,x, and later save it in checkpoint cj,y

If Pi fails, on recovery, all the other processes will send their dependency information to Pi, who will use that info to calculate the recovery line

Page 33: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Checkpoint Dependency Graph

Pi takes the dependency information and constructs a dependency graph

The nodes of the graph are all of the ca,b, and the current state of all un-failed processes

A directed edge is drawn from ci,x-1 to cj,y if i j and a message was sent from Ii,x to Ij,y i = j and y = x

An edge from ci,x-1 to cj,y implies that cj,y contains a message received not marked as sent in ci,x-1

Page 34: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Example

p

q

r

*

Page 35: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

AlgorithmInclude last ckpt of each failed P in RecoverySetInclude current state of un-failed P in RecoverySetMark all ckpts. reachable from any node in RSWhile(at least one node in RS is marked)

Replace each marked RS element with the latest unmarked ckpt of the same processMark all ckpts. reachable from any node in RS

Page 36: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Finding the Recovery Set

XX

X

X

Page 37: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

Finding the Recovery Set

X

X

X

X

X

X

Page 38: Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.

The Recovery Line

p

q

r

*