Model-Driven Engineering of Fault Tolerance Solutions in

17
Model-Driven Engineering of Fault Tolerance Solutions in Enterprise Distributed Real-time and Embedded Systems Sumant Tambe* Aniruddha Gokhale Jaiganesh Balasubramanian Krishnakumar Balasubramanian Douglas C. Schmidt Vanderbilt University, Nashville, TN, USA Contact : *[email protected] This work is supported by subcontracts from LMCO & BBN Presented at OMG RTWS 2006 Arlington, VA, July 2006

Transcript of Model-Driven Engineering of Fault Tolerance Solutions in

Page 1: Model-Driven Engineering of Fault Tolerance Solutions in

Model-Driven Engineering of Fault Tolerance Solutions in Enterprise Distributed Real-time

and Embedded SystemsSumant Tambe*

Aniruddha GokhaleJaiganesh Balasubramanian

Krishnakumar BalasubramanianDouglas C. Schmidt

Vanderbilt University, Nashville, TN, USAContact : *[email protected]

This work is supported by subcontracts from LMCO & BBN

Presented at OMG RTWS 2006Arlington, VA, July 2006

Page 2: Model-Driven Engineering of Fault Tolerance Solutions in

2

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Component-based Enterprise DRE Systems! Characteristics of component-based

enterprise DRE systems! Applications composed of one or

more “operational string” of services! Composed of several services or

systems of systems! Dynamic (re)-deployment of

components into operational strings! Simultaneous QoS (Reliability,

Availability) requirements! Examples of Enterprise DRE systems

! Advanced air-traffic control systems! Unmanned aerospace mission systems! Continuous patient monitoring systems

Detector1

Detector2

Planner 3 Planner1

Error Recovery

Effector1

Effector2

Config

LEGEND

Receptacle

Event Sink

Event Source

Facet

Page 3: Model-Driven Engineering of Fault Tolerance Solutions in

3

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

QoS Issues in Enterprise DRE Systems

! Regulating & adapting to changes in runtime environments

! Satisfying tradeoffs between multiple (often tangled) QoS requirements! e.g., fault-tolerance, high

performance, real-time etc.! Satisfying QoS demands in the face

of fluctuating and/or insufficient resources! e.g., hardware/software failures,

mobile ad hoc networks

Tangled QoS Concerns

Resource Constrained

Changing Runtime

Page 4: Model-Driven Engineering of Fault Tolerance Solutions in

4

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Tangled QoS-Concerns in Fault-Tolerance

Separation of Concerns &

Managing Variability is the Key

! Demonstrates numerous tangled para-functional concerns

! Significant sources of variability that affect end-to-end QoS (performance + availability)

Design-time Deployment-time Run-time

Page 5: Model-Driven Engineering of Fault Tolerance Solutions in

5

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Fault-Tolerance Issues in DRE Systems

C1 C2 C3 C4 C5

! Per-component concern – choice of implementation! Depends of resources, compatibility with other components in assembly

! Availability concern – what is the degree of redundancy? What replication styles to use? Does it apply to whole assembly?

! Failure recovery concern – what is the unit of failover?! State synchronization concerns – What is data-sync frequency?! Deployment concern – how to place components? Minimize failure risk to the

system

Failover Unit

Page 6: Model-Driven Engineering of Fault Tolerance Solutions in

6

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Model-Driven Engineering – A Promising Approach

! Higher level of abstraction than third generation programming languages

! Model-per-concern paradigm alleviates system complexity! Deployment model! Component assembly

model! System structural model! Different QoS models

(Fault-tolerance)! Generative and model

transformation techniques to weave in appropriate glue code

Complex System

Page 7: Model-Driven Engineering of Fault Tolerance Solutions in

7

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Fault-tolerance Modeling Abstractions

! Fail-over Unit (FOU): Abstracts away details of granularity of protection (e.g. Component, Assembly, App-string)

! Replica Group (RPG): Abstracts away fault-tolerance policy details (e.g. Active/passive replication, state-synchronization, topology of replica)

! Shared Risk Group (SRG): Captures associations related to risk. (e.g. shared power supply among processors, shared LAN)

Protection granularity concerns

State-synchronization concerns

Component Placement constraints

Replica Distance Metric

Interpreter (component placement constraint solver): Encapsulates an algorithm for component-node assignment based on replica distance metric

PICML (Platform Independent Component Modeling Language) Metamodel Enhancements made to CoSMIC Modeling Tool suite

Page 8: Model-Driven Engineering of Fault Tolerance Solutions in

8

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

container/component servercontainer/component server“Client” primary IOR

Primary FOUPrimary FOU

A B C

Fail-over Unit Example

container/component servercontainer/component server container/component servercontainer/component server

Replica FOUReplica FOU

A’ B’ C’

secondary IOR

container/component servercontainer/component server container/component servercontainer/component server container/component servercontainer/component server

Primary Component

Replica Component

Page 9: Model-Driven Engineering of Fault Tolerance Solutions in

9

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

DataCenter1_SRGDataCenter1_SRG DataCenter2_SRGDataCenter2_SRG

Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG Node1Node1(blade31)(blade31)

Node2Node2(blade32)(blade32)

Shelf1_SRGShelf1_SRG Shelf2_SRGShelf2_SRG

Blade30Blade30

Ship_SRGShip_SRG

Blade34Blade34 Blade29Blade29

Shelf1_SRGShelf1_SRG

Blade36Blade36Blade33Blade33

Shared Risk Group Example

Page 10: Model-Driven Engineering of Fault Tolerance Solutions in

10

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

R1

P

R2

R3

Define N orthogonal vectors, one for each of the distance valuescomputed for the N components (with respect to a primary) and vector-sum these to obtain a resultant. Compute the magnitude of the resultant as a representation of the composite distance captured by the placement .

Formulation of Replica Placement Problem

1. Compute the distance from each of the replicas to the primary for a placement.

2. Record each distance as a vector, where all vectors are orthogonal.

3. Add the vectors to obtain a resultant.4. Compute the magnitude of the resultant.5. Use the resultant in all comparisons (either among

placements or against a threshold)6. Apply a penalty function to the composite distance (e.g. pair

wise replica distance or uniformity)

Page 11: Model-Driven Engineering of Fault Tolerance Solutions in

11

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Node2Node2(blade32)(blade32)

Shelf1_SRGShelf1_SRG

Ship_SRGShip_SRG

Blade34Blade34 Blade36Blade36Replica

1

Primary

Replica2

Replica3

Composite Composite DistanceDistance

Blade30Blade30

Node1Node1(blade31)(blade31)

DataCenter2_SRGDataCenter2_SRGDataCenter1_SRGDataCenter1_SRG

Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG

Shelf1_SRGShelf1_SRGShelf2_SRGShelf2_SRG

Blade29Blade29 Blade33Blade33

Component Placement Example using SRGs

Page 12: Model-Driven Engineering of Fault Tolerance Solutions in

12

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

5. Distance-based constraint algorithm determines replica placement in deployment descriptors.

GME/PICMLGME/PICML

1. Model components and application strings in PICML2. Model Fail Over Units (FOUs) and Shared Risk Groups (SRGs)3. Generate deployment plan

4. Interpreter automatically injectsreplicas and associated CCM IOGRs

FT Modeling & Generative Steps

Model Model InformationInformation

Domain, Deployment,

SRG, and FOU injection

Replica Placement Algorithm

modelFT InterpreterFT Interpreter

Augmented Augmented Deployment Deployment

PlanPlan

Page 13: Model-Driven Engineering of Fault Tolerance Solutions in

13

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

FT Requirements Screenshot in CoSMIC (PICML)Component FOU

Replication Style = Active

Replica = 3Min Distance = 4

Deployment plan to be

augmented

Shared Risk Group (SRG)

hierarchy

LEGEND

FOU: Fail Over Unit

SRG: Shared Risk Group

Page 14: Model-Driven Engineering of Fault Tolerance Solutions in

14

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Generative Capabilities for Provisioning FT

! Automatic Injection of replicas! Augmentation of

deployment plan based on number of replicas

! Automatic Injection of FT infrastructure components! E.g. Collocated “heartbeat”

(HB) component with every protected component.

! Automatic Injection of connection meta-data! Specialized connection

setup for protected components (e.g. Interoperable Group References IOGR)

C1C1C1C1C1

HB

Container

Page 15: Model-Driven Engineering of Fault Tolerance Solutions in

15

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

container/component servercontainer/component server

FPCFPC

“client”

periodic FPC heartbeat

primary IOR

Primary FOUPrimary FOU

A

HB

container/component servercontainer/component server

Bcontainer/component servercontainer/component server

C

container/component servercontainer/component server

Replica FOUReplica FOU

A’container/component servercontainer/component server

B’container/component servercontainer/component server

C’

secondary IOR

IOG

R

FPCFPC

HB HB

intra-FOU heartbeat

Click to return to concepts slide.

HBHB HB

Example of Automated Heartbeat Component Injection

Primary Component

Replica Component

Collocated heartbeat

component

ConnectionInjection

Page 16: Model-Driven Engineering of Fault Tolerance Solutions in

16

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Future Work! Developing advanced constraint solver

algorithms to incorporate multiple dimensions of constraints in component placement decision (e.g. resources, communication latency)

! Optimizing the number of generated heartbeat components for collocated, protected application components.

! Enhancing the DSL and the tools to capture the configurability required by the new Lightweight RT/FT CORBA specification.! E.g. Enhancing the model interpreter to

support a wide spectrum of established fault-tolerance mechanisms

! Enhancing working prototypes and evaluating them in representative DRE systems

HB

Container

C1C1C1C1

Configurable FT Infrastructure

Page 17: Model-Driven Engineering of Fault Tolerance Solutions in

17

Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006

Concluding Remarks

! Model-Driven Engineering separates dependability concerns from other system development concerns

! Separation of concerns helps alleviate system complexity

! Model-based generative capabilities “compile” FT infrastructure (e.g. heartbeat components and connections) during model interpretation time and synthesize meta-data

Tools available for download fromwww.dre.vanderbilt.edu/cosmicwww.dre.vanderbilt.edu/CIAO