Model-Driven Engineering of Fault Tolerance Solutions in
Transcript of Model-Driven Engineering of Fault Tolerance Solutions in
Model-Driven Engineering of Fault Tolerance Solutions in Enterprise Distributed Real-time
and Embedded SystemsSumant Tambe*
Aniruddha GokhaleJaiganesh Balasubramanian
Krishnakumar BalasubramanianDouglas C. Schmidt
Vanderbilt University, Nashville, TN, USAContact : *[email protected]
This work is supported by subcontracts from LMCO & BBN
Presented at OMG RTWS 2006Arlington, VA, July 2006
2
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Component-based Enterprise DRE Systems! Characteristics of component-based
enterprise DRE systems! Applications composed of one or
more “operational string” of services! Composed of several services or
systems of systems! Dynamic (re)-deployment of
components into operational strings! Simultaneous QoS (Reliability,
Availability) requirements! Examples of Enterprise DRE systems
! Advanced air-traffic control systems! Unmanned aerospace mission systems! Continuous patient monitoring systems
Detector1
Detector2
Planner 3 Planner1
Error Recovery
Effector1
Effector2
Config
LEGEND
Receptacle
Event Sink
Event Source
Facet
3
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
QoS Issues in Enterprise DRE Systems
! Regulating & adapting to changes in runtime environments
! Satisfying tradeoffs between multiple (often tangled) QoS requirements! e.g., fault-tolerance, high
performance, real-time etc.! Satisfying QoS demands in the face
of fluctuating and/or insufficient resources! e.g., hardware/software failures,
mobile ad hoc networks
Tangled QoS Concerns
Resource Constrained
Changing Runtime
4
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Tangled QoS-Concerns in Fault-Tolerance
Separation of Concerns &
Managing Variability is the Key
! Demonstrates numerous tangled para-functional concerns
! Significant sources of variability that affect end-to-end QoS (performance + availability)
Design-time Deployment-time Run-time
5
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Fault-Tolerance Issues in DRE Systems
C1 C2 C3 C4 C5
! Per-component concern – choice of implementation! Depends of resources, compatibility with other components in assembly
! Availability concern – what is the degree of redundancy? What replication styles to use? Does it apply to whole assembly?
! Failure recovery concern – what is the unit of failover?! State synchronization concerns – What is data-sync frequency?! Deployment concern – how to place components? Minimize failure risk to the
system
Failover Unit
6
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Model-Driven Engineering – A Promising Approach
! Higher level of abstraction than third generation programming languages
! Model-per-concern paradigm alleviates system complexity! Deployment model! Component assembly
model! System structural model! Different QoS models
(Fault-tolerance)! Generative and model
transformation techniques to weave in appropriate glue code
Complex System
7
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Fault-tolerance Modeling Abstractions
! Fail-over Unit (FOU): Abstracts away details of granularity of protection (e.g. Component, Assembly, App-string)
! Replica Group (RPG): Abstracts away fault-tolerance policy details (e.g. Active/passive replication, state-synchronization, topology of replica)
! Shared Risk Group (SRG): Captures associations related to risk. (e.g. shared power supply among processors, shared LAN)
Protection granularity concerns
State-synchronization concerns
Component Placement constraints
Replica Distance Metric
Interpreter (component placement constraint solver): Encapsulates an algorithm for component-node assignment based on replica distance metric
PICML (Platform Independent Component Modeling Language) Metamodel Enhancements made to CoSMIC Modeling Tool suite
8
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
container/component servercontainer/component server“Client” primary IOR
Primary FOUPrimary FOU
A B C
Fail-over Unit Example
container/component servercontainer/component server container/component servercontainer/component server
Replica FOUReplica FOU
A’ B’ C’
secondary IOR
container/component servercontainer/component server container/component servercontainer/component server container/component servercontainer/component server
Primary Component
Replica Component
9
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
DataCenter1_SRGDataCenter1_SRG DataCenter2_SRGDataCenter2_SRG
Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG Node1Node1(blade31)(blade31)
Node2Node2(blade32)(blade32)
Shelf1_SRGShelf1_SRG Shelf2_SRGShelf2_SRG
Blade30Blade30
Ship_SRGShip_SRG
Blade34Blade34 Blade29Blade29
Shelf1_SRGShelf1_SRG
Blade36Blade36Blade33Blade33
Shared Risk Group Example
10
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
R1
P
R2
R3
Define N orthogonal vectors, one for each of the distance valuescomputed for the N components (with respect to a primary) and vector-sum these to obtain a resultant. Compute the magnitude of the resultant as a representation of the composite distance captured by the placement .
Formulation of Replica Placement Problem
1. Compute the distance from each of the replicas to the primary for a placement.
2. Record each distance as a vector, where all vectors are orthogonal.
3. Add the vectors to obtain a resultant.4. Compute the magnitude of the resultant.5. Use the resultant in all comparisons (either among
placements or against a threshold)6. Apply a penalty function to the composite distance (e.g. pair
wise replica distance or uniformity)
11
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Node2Node2(blade32)(blade32)
Shelf1_SRGShelf1_SRG
Ship_SRGShip_SRG
Blade34Blade34 Blade36Blade36Replica
1
Primary
Replica2
Replica3
Composite Composite DistanceDistance
Blade30Blade30
Node1Node1(blade31)(blade31)
DataCenter2_SRGDataCenter2_SRGDataCenter1_SRGDataCenter1_SRG
Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG
Shelf1_SRGShelf1_SRGShelf2_SRGShelf2_SRG
Blade29Blade29 Blade33Blade33
Component Placement Example using SRGs
12
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
5. Distance-based constraint algorithm determines replica placement in deployment descriptors.
GME/PICMLGME/PICML
1. Model components and application strings in PICML2. Model Fail Over Units (FOUs) and Shared Risk Groups (SRGs)3. Generate deployment plan
4. Interpreter automatically injectsreplicas and associated CCM IOGRs
FT Modeling & Generative Steps
Model Model InformationInformation
Domain, Deployment,
SRG, and FOU injection
Replica Placement Algorithm
modelFT InterpreterFT Interpreter
Augmented Augmented Deployment Deployment
PlanPlan
13
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
FT Requirements Screenshot in CoSMIC (PICML)Component FOU
Replication Style = Active
Replica = 3Min Distance = 4
Deployment plan to be
augmented
Shared Risk Group (SRG)
hierarchy
LEGEND
FOU: Fail Over Unit
SRG: Shared Risk Group
14
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Generative Capabilities for Provisioning FT
! Automatic Injection of replicas! Augmentation of
deployment plan based on number of replicas
! Automatic Injection of FT infrastructure components! E.g. Collocated “heartbeat”
(HB) component with every protected component.
! Automatic Injection of connection meta-data! Specialized connection
setup for protected components (e.g. Interoperable Group References IOGR)
C1C1C1C1C1
HB
Container
15
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
container/component servercontainer/component server
FPCFPC
“client”
periodic FPC heartbeat
primary IOR
Primary FOUPrimary FOU
A
HB
container/component servercontainer/component server
Bcontainer/component servercontainer/component server
C
container/component servercontainer/component server
Replica FOUReplica FOU
A’container/component servercontainer/component server
B’container/component servercontainer/component server
C’
secondary IOR
IOG
R
FPCFPC
HB HB
intra-FOU heartbeat
Click to return to concepts slide.
HBHB HB
Example of Automated Heartbeat Component Injection
Primary Component
Replica Component
Collocated heartbeat
component
ConnectionInjection
16
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Future Work! Developing advanced constraint solver
algorithms to incorporate multiple dimensions of constraints in component placement decision (e.g. resources, communication latency)
! Optimizing the number of generated heartbeat components for collocated, protected application components.
! Enhancing the DSL and the tools to capture the configurability required by the new Lightweight RT/FT CORBA specification.! E.g. Enhancing the model interpreter to
support a wide spectrum of established fault-tolerance mechanisms
! Enhancing working prototypes and evaluating them in representative DRE systems
HB
Container
C1C1C1C1
Configurable FT Infrastructure
17
Sumant Tambe, et. al MDE based Fault ToleranceOMG RTWS 2006
Concluding Remarks
! Model-Driven Engineering separates dependability concerns from other system development concerns
! Separation of concerns helps alleviate system complexity
! Model-based generative capabilities “compile” FT infrastructure (e.g. heartbeat components and connections) during model interpretation time and synthesize meta-data
Tools available for download fromwww.dre.vanderbilt.edu/cosmicwww.dre.vanderbilt.edu/CIAO