Failure Handling in a modal Language

72
1 Failure Handling in a modal Language Nels Eric Beckman Research Talk Institute for Software Research October 30, 2006

description

Failure Handling in a modal Language. Nels Eric Beckman Research Talk Institute for Software Research October 30, 2006. Claims Made in this Talk. ML5 is an elegant language for programming distributed systems. In the face of node failure, the meaning of ML5 programs becomes unclear. - PowerPoint PPT Presentation

Transcript of Failure Handling in a modal Language

Page 1: Failure Handling in a modal Language

1

Failure Handling in a modal Language

Nels Eric BeckmanResearch Talk

Institute for Software ResearchOctober 30, 2006

Page 2: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

2

Claims Made in this Talk

• ML5 is an elegant language for programming distributed systems.

• In the face of node failure, the meaning of ML5 programs becomes unclear.

• We propose extensions to ML5 that makes their meaning clear.• (In reality, this research is a work in

progress.)

Page 3: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

3

ML5

• A Programming Language for Distributed Systems

• Based on a Modal Logic• i.e. A Logic With an Embedded Notion

of Place

• Tom Murphy’s Thesis Work• Targeted for Grid Programming

Page 4: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

4

ML5, Briefly...

• Allows Hosts to Send ‘Thunks’ to One Another for Execution• In practice, code can be more cleanly

decomposed.

• Has An Advanced Type System • Location-specific resources can be

typed as so.

Page 5: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

5

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 6: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

6

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 7: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

7

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 8: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

8

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

rpc “b”

Page 9: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

9

RPC-Style Distributed Programming

PC

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 10: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

10

RPC-Style Distributed Programming

PCPC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 11: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

11

RPC-Style Distributed Programming

PC

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

Page 12: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

12

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

ret x

Page 13: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

13

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

ret x

Page 14: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

14

RPC-Style Distributed Programming

PC

Host

Active thread

Blocked thread

Message

fun a = fun b =

rpc(“b”,19.x.x.x) + r

return x;

ret x

Page 15: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

15

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 16: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

16

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 17: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

17

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 18: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

18

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 19: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

19

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 20: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

20

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 21: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

21

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 22: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

22

ML5 Illustration

PC

Host

Location of thread

Migration of thread

Page 23: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

23

Example

• Remotely Finding List’s Sum (RPC)Server Code:

class ListServ {

List<Integer> myList = new ...

List<Integer> getList() {

return myList; }

}

Page 24: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

24

Example

• Remotely Finding List’s Sum (RPC)Client Code:class ListClient {

ListServerStub myServ = new ...

public void foo() {

List<Integer> list = myServ.getList();

for(Integer item: list) {

count+= item.intValue();

}

if( count >= 40 )

...

}}

Page 25: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

25

Example

• Remotely Finding List’s Sum (RPC)• To Fix Should We:

• Add a new server operation that returns true if a list’s sum is greater than 40?• Weird if operation is only used once.• We wouldn’t structure application this

way in a centralized setting.• Bite the performance bullet and send

the whole list?

Page 26: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

26

Example

• Remotely Finding List’s Sum (ML5)Before:fun foo remote_host remote_list_ref =

let fun sum a_list =

foldl op+ 0 a_list

in

if sum (

get[remote_host]( !remote_list_ref )

) > 40

then true

else false

Page 27: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

27

Example

• Remotely Finding List’s Sum (ML5)After:fun foo remote_host remote_list_ref =

let fun sum a_list =

foldl op+ 0 a_list

in

get[remote_host](

if sum ( !remote_list_ref ) > 40

then true

else false

)

Page 28: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

28

Types

• ML5 Type System Embeds a Notion of Place• Some values can be used at any

place.• e.g. Primitive data types, structures

• Some values can only be used at the location where they make sense.• e.g. File descriptors, reference cells,

printers

Page 29: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

29

Just a Few Types…

• τ@w – “The type τ is well-typed on host w.”

Page 30: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

30

Just a Few Types…

• get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.”

• Example:fun foo (x: int ref @w’, a: w’ addr @w) =

get[w’,a]( !x + !x )

Page 31: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

31

Just a Few Types…

• get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.”

• Example:fun foo (x: int ref @w’, a: w’ addr @w) =

get[w’,a]( !x + !x ) Typedint@w’

Page 32: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

32

• get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.”

• Example:fun foo (x: int ref @w’, a: w’ addr @w) =

get[w’,a]( !x + !x )

Just a Few Types…

Typedint@w

Page 33: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

33

Just a Few Types…

• □τ – “Suspended code that can be evaluated anywhere. Produces a value of type τ.”

• Example:(let fun sum il = foldl op+ 0 ilin

box (sum [1,2,3,4,5])end): □int @w

Page 34: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

34

Just a Few Types…

• ◊τ – “A value of type τ that exists at some other location.”

• Example:here (ref 5):◊(ref int) @w

Page 35: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

35

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 *)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3))

Page 36: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

36

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 *)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3)) Host 2 dies!

Page 37: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

37

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 *)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3)) Host 2 dies!

Throw an exception?

Page 38: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

38

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 *)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3)) Host 2 dies!

Throw an exception?

Continue on from Host 3?

Page 39: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

39

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 *)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3)

or_if_i_cant_return (...))) Host 2 dies!

Throw an exception?

Continue on from Host 3?

Page 40: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

40

But What About Host Failure?

• What happens here?

(* at host 1 *)

get[w_2, a_2](

(* at host 2 WHICH DOESN’T EXIST!*)

!int_ref_at_w_2 +

get[w_3, a_3](

(* at host 3 *)

!int_ref_at_w_3)

or_if_i_cant_return (...))) Host 2 dies!

Throw an exception?

Continue on from Host 3?

Page 41: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

41

What We Want (Intuitively)

callcc x =>(* at host 1 *)get[w_2, a_2](

(* at host 2 *)

!int_ref_at_h_2 +get[w_3, a_3](

(* at host 3 *)!int_ref_at_h_3or_if_i_cant_return (throw (raise NetFail) to

x)))

Page 42: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

42

What We Want (Intuitively)

callcc x =>(* at host 1 *)get[w_2, a_2](

(* at host 2 *)

!int_ref_at_h_2 +get[w_3, a_3](

(* at host 3 *)!int_ref_at_h_3or_if_i_cant_return (throw (raise NetFail) to

x)))

Don’t actually throw

something through the

network.

Page 43: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

43

What We Want (Intuitively)

callcc x =>(* at host 1 *)get[w_2, a_2](

(* at host 2 *)

!int_ref_at_h_2 +get[w_3, a_3](

(* at host 3 *)!int_ref_at_h_3or_if_i_cant_return (throw (raise NetFail) to

x)))

Don’t actually throw

something through the

network.

Have host one detect the failure.

Page 44: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

44

Isn’t This Just a ‘Timeout’ Exception?

• A Good Question:• “Why not just have the ‘get’ operation

throw a timeout exception, like in Java?”

• e.g.

get[w_2, a_2] (

!int_on_w2

) handle TimeOut => (* do something *)

Page 45: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

45

Answers

1. This is actually a little smarter than just ‘timeout.’

2. The ‘Implicit Spawn’ Problem

Page 46: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

46

Answers

1. This is actually a little smarter than just ‘timeout.’

2. The ‘Implicit Spawn’ Problem

get[w_2, a_2] (

(* extremely complicated op *)

) handle TimeOut => (* do something *)

Page 47: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

47

Answers

1. This is actually a little smarter than just ‘timeout.’

2. The ‘Implicit Spawn’ Problem

get[w_2, a_2] (

(* extremely complicated op *)

) handle TimeOut => (* do something *)

T2

T1

Page 48: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

48

What We Need

• Share the Fact that Host 1 Has ‘Given Up’

• Kill the Thread ASAP• Make That Thread’s Actions

Irrelevant• Each host gets a chance to ‘undo’

potential effects.

• All with ‘Best Effort’

Page 49: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

49

One More Wrinkle

Catom 1

Catom 2

Grab ‘continuation’

Page 50: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

50

One More Wrinkle

Catom 1

Catom 2

Assign ‘Catom1’ to ‘myLeader’

Page 51: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

51

One More Wrinkle

Catom 1

Catom 2

Page 52: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

52

The Design, In Short

try

e_1

continuing

e_2

end

Page 53: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

53

The Design, In Short

try

e_1

continuing

e_2

end

1. Execute e_1

Page 54: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

54

The Design, In Short

try

e_1

continuing

e_2

end

1. Execute e_1

2. In the event of node failure... the entire expression will throw an exception on this host.

Page 55: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

55

The Design, In Short

try

e_1

continuing

e_2

end

1. Execute e_1

2. In the event of node failure... the entire expression will throw an exception on this host.

3. On the other hosts, e_2 will be executed, and its value discarded.

Page 56: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

56

The Design, In Short

(* host 1*)

try

(* set all of my neighbor’s

‘myLeader’ to host 1 *)

continuing

if !myLeader = host_1

then myLeader := NONE

else ()

end

Page 57: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

57

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PCtry

continuing

l:

end

Page 58: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

58

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC try

continuing

l:

end

Store Cont(stack)

Page 59: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

59

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PCtry

continuing

l:

end

Store Cont(▪;l)

Page 60: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

60

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PCtry

continuing

l:

end

Page 61: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

61

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC

try

continuing

l:

end

Store Cont(▪;l)

Page 62: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

62

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC

try

continuing

l:

end

Page 63: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

63

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC

try

continuing

l:

end

Page 64: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

64

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC

try

continuing

l:

end

Error!

Error!

Page 65: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

65

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC try

continuing

l:

end

Restore Cont.

Restore Cont.

PCl:

Page 66: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

66

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC raise Fail)

handle...

PC

l:

Page 67: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

67

ML5-C: Error Continuations

Host

Visited Host

Location of thread

Migration of thread

PC

raise Fail)

handle...

Page 68: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

68

Interesting Note

• In Failure Case, We Have to Reason About Client and Server.• (The avoidance of this was one of the

touted benefits of ML5!)

Page 69: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

69

Future Work

• This Work is Not Yet Finished• More Restrictive Modal Basis

• Only neighbor catoms are accessible• This would be a ‘lower level’

language in some sense.

Page 70: Failure Handling in a modal Language

70

Thanks!

Additional Questions?

Page 71: Failure Handling in a modal Language

Failure Handling in a Modal Language

ISR

71

Failure Handling is More Natural

• In Claytronics, Failure is Possible at Any Moment.

• Intuitively, it would be nice to say:

try {

// a complex, multi host operation }

catch (Failure v) {

// take an alternate

// course of action. }

Page 72: Failure Handling in a modal Language

72

So You Want to See the Typing Rules...

Note: These rules represent just a snapshot of the work.