Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS

21
Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B

description

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS. Development of Fault Tolerant Grid Applications Using Distributed B. Motivation. Grids have become widespread in organizations Handle large amount of information Manage computational resources - PowerPoint PPT Presentation

Transcript of Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS

Page 1: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS

Development of Fault Tolerant Grid

Applications Using Distributed B

Page 2: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Motivation

•Grids have become widespread in organizations• Handle large amount of information• Manage computational resources

•Difficult to implement “correct” Grid applications • Formal methods useful in order to ensure correctness of

specifications• Can be difficult to implement

•The specification language should take into account the features of the underlying platform

•Fault tolerance also important for correctness due to the nature of the grid environment

Page 3: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grids

•Used for large-scale distributed systems• Scientific computing, e.g., in Physics and engineering• Business applications

•Share information and computational resources over organizational boundaries

•Loosely coupled systems• SOA

•Client – Server architecture

Page 4: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid services

•Services in a grid environment that can be accessed by clients

•Similar to remote objects in CORBA and RMI• Remote procedures used for communication

HostClient

Client

Grid service

Grid service

Page 5: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid Services

•Based on Web Services• XML• SOAP• WSDL

•Extends Web services with• Potentially transient services containing state• Service data• Notifications

•Globus Toolkit middleware

Client Grid service

Remote procedure call

Notification

Page 6: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

The architecture of a grid application

OS OS

Globus T. Globus T.

Client Grid service

RPC

Notification

Host 1 Host 2

Page 7: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Faults

• Remote procedure calls can fail in four different ways1. The server grid service instance has crashed before the

call2. The network connection fails when calling a remote

procedure3. The server instance fails during the call4. The network connection fails when returning the result

• A notification can fail to arrive for two reasons1. The sending grid service crashed before sending the

notification2. The network connection fails during sending

• The client crashes when using a server grid service• The server grid service becomes an orphan

Page 8: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Fault tolerance using GT

•No support for advanced fault tolerance mechanisms such as replication or check-pointing

•Exception is raised in the caller when a call to a remote procedure fails

• Not easy to know what caused the exception to be raised

•That a notifications is lost can be discovered with timers in the client

•The most difficult error to handle is removal of orphan grid service instances

Page 9: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Orphan control

•Need to remove orphan grid service instances, since they waste resources

•New remote procedure isAlive

•Timer in both client and server grid service

Page 10: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Orphan control

•Need to remove orphan grid service instances, since they waste resources

•New remote procedure isAlive

•Timer in both client and server grid service

•An exception is raised in the client when a call to isAlive fails

•The server grid service is deleted when a timeout occurs in it

Page 11: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Event B

•Extension of the B Method•Developed by J. R. Abrial•Based on Action Systems by Back and Kurki-Suonio•Related To B Action Systems

SYSTEM CVARIABLES xINVARIANT Inv_CINITIALISATION x := x0EVENTS C_Evt1 = ANY u WHERE G1(u,x) THEN S1 END; C_Evt2 = SELECT G2 THEN S2 END;END

Page 12: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Formal development of Grid applications

•We like to have a formal method suitable for developing fault tolerant grid applications•Difficult to create implementable specifications of grid applications in Event B

• No grid communication mechanisms such as remote procedures and notifications

• No fault tolerance mechanisms• Difficult to implement due to synchronization issues and

atomicity of events

•We need to extend Event B with constructs for • Specifying grid services• Remote procedure calls and notifications• Fault tolerance

•Extensions should be introduced in a manner that simplifies implementation

Page 13: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Distributed B

•Provides two new types of B machines• GRIDSERVICE• GRID_REFINEMENT

•Take into account grid specific features• Remote procedures• Notifications• Timeouts due to lost notifications• Exceptions due to failed calls to isAlive

•Enables us to prove properties about the entire system

•Are translated to ordinary B for verification• New constructs get their semantics from the translation• Automatic generation of proof obligations

•Enable automatic or semi-automatic translation of the specification to a programming language

Page 14: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid service machine

•Abstract specification of a grid service

•A grid service machine is a template that clients obtain instances of

• Compare to Classes in OO

•Remote procedures• Ordinary B procedures called from

a client

•Events• Executed independently of a client

•Notifications• Sent when all events have

become disabled

Proc(p)

(J1 J2) Q

J2T2

J1T1

Grid service

Remote procedures:

Events:

Notifications:

Page 15: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid refinement machine (1)

•A client that uses grid service machine instances

•Refines GRIDSERVICE, ordinary SYSTEM or REFINEMENT

•Clause for enabling dynamic management of grid service machine instances

• Instances are used as variables• When a failed instance is discovered it is marked as no

longer in use and deleted from the application

•Clause for refining remote procedures

•Clause for refining events

Page 16: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid refinement machine (2)

•Special substitution used in events for making remote procedures calls

• Enables the exceptions for failed calls to be handled

•Special events that consists of two parts for handling notifications

• First part enabled when a notification has been sent from a grid service

• Second part enabled when a timeout occurs• Executed once for each notification/timeout

•Special event for handling failed calls to isAlive• Enabled for each grid service instance in use.• Non-deterministically models failures of instances

Page 17: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

The behaviour of grid components

Proc(p)

(J1 J2) Q

G2S2

NotifHandler

J2T2G3S3

J1T1

Grid serviceGrid refinement

G1S1

Remote procedures:

Events:

Notifications:

Notification handlers:

Events:

Page 18: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid service machine

GRIDSERVICE A

VARIABLES yINVARIANT Inv_AINITIALISATION y := y0

REMOTE_PROCEDURES

Proc(p) = PRE P(p) THEN TEND

EVENTS

A_Evt1 = ANY u WHERE J1(u,y) THEN T1 END;

A_Evt2 = SELECT J2 THEN T2

END;

NOTIFICATIONS Notif =

GUARANTEES Q END

END

Page 19: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid refinement machine (1)

GRID_REFINEMENT C2REFINES C1

REFERENCES AVARIABLES z,x,a_inst

INVARIANT a_inst:A & Inv_C’

INITIALISATION x := x0 || z:=z0 || a_inst::A

EVENTS C_Evt1 =

SELECT G1’ THEN

CALL a_inst.Proc(x) EXCEPTION SE

END || S1’ END;

C_Evt2 = SELECT G2’ THEN S2’ END

Page 20: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Grid refinement machine (2)

NOTIFICATION_HANDLERS NotifHandler = NOTIFICATION Notif SOURCE v:A THEN S3 TIMEOUT ST END

IS_ALIVE_HANDLERS IAHandler = SOURCE v:A

THEN SIA

ENDEND

Page 21: Pontus Boström and Marina Waldén  Åbo Akademi University/ TUCS

Conclusions

•Enables construction of correct fault tolerant grid applications

•Automatic generation of proof obligations

•Implementable architecture by construction

•These Event B extensions can also use other middleware for distributed systems