Modular Coordination of multiple autonomic managersModular Coordination: principle 4 Application...
Transcript of Modular Coordination of multiple autonomic managersModular Coordination: principle 4 Application...
Modular Coordination of multiple autonomic managersusing Modular Discrete Controller Synthesis
Soguy Mak-karé Gueye1 Gwenaël Delaval1
Noël de Palma1 Eric Rutten2
1Laboratoire Informatique de Grenoble (LIG)
2INRIA Grenoble - Rhône-Alpes
Ctrl-a / STAARS, January 13, 2014
1/30
Outline
1 Need for coordination in Self-managing systemsAutonomic Management of large-scale systemsAutonomic manager
2 ApproachSynchronous programmingDiscrete controller synthesisBZR programming language
3 Modular coordination of autonomic managersNeed for modular design of the coordination controlModular Coordination: principle
4 ApplicationMulti-tier systems in a Datacenter
Self-sizing, self-repair and server consolidation manager
Coordination Problem
Coordination policy
Modelling the managers behavioursCoordinating the managers
2/30
Outline
1 Need for coordination in Self-managing systemsAutonomic Management of large-scale systemsAutonomic manager
2 Approach
3 Modular coordination of autonomic managers
4 Application
3/30
Autonomic Management of large-scale systems
Di�cult to handle manually
Complexity in size
Degree of heterogeneity
Distributed architecture
Autonomic Computing
Self-managing System
Manages itself autonomously
Aware of its context
React and adapt to changes
functional areas
Self-Con�guration, Self-Healing, Self-Optimization, Self-Protection ...
4/30
Autonomic manager
Software (or Hardware) component
Implement management decisions
BehavioursMonitor system execution status
cpu load, response time, etc.
Analyse of the measuresPlan & execute recon�gurations
Ensuring global system management
Multiple autonomic managers
achieve overall objectives
Preventing
con�icting decisions, redundant operations
Coordination to get coherent and e�cient management
⇒ Problem of synchronization and logical control AMs actions
5/30
Outline
1 Need for coordination in Self-managing systems
2 ApproachSynchronous programmingDiscrete controller synthesisBZR programming language
3 Modular coordination of autonomic managers
4 Application
6/30
Synchronous programming
Modelling formalism and programming language
data-�ow nodes / equations : input �ows → output �ows
mode automata (FSM)
parallel and hierarchical composition
tools: compilers (e.g., Heptagon), code generation,veri�cation (model-checking), ...
example: computing task control, delayable
Idle Wait
e r and c/s
delayable(r,c,e) = a,s
Activec/s
r and not ca = false
a = true
a = false
node delayable(r,c,e:bool) returns (a,s:bool)
let automaton
state Idle do
a = false; s = r and c
until r and c then Active
| r and not c then Wait
state Wait do a = false; s = c
until c then Active
state Active do a = true; s=false
until e then Idle
end tel
7/30
Discrete controller synthesis
Purpose
Constrain system behaviours to satisfy a property Φcompute a controller
Principle (on implicit equational representation)
State memoryTrans transition functionOut output function
Partition of variables : controllable (Y c), uncontrollable (Y u)
Computation of a controller such that the controlled system satis�esΦ (invariance, reachability, attractivity, ...)
DCS tool: Sigali (H. Marchand e.a.)
8/30
Discrete controller synthesis
Purpose
Constrain system behaviours to satisfy a property Φcompute a controller
Principle (on implicit equational representation)
State memoryTrans transition functionOut output function
Trans State OutZY
Partition of variables : controllable (Y c), uncontrollable (Y u)
Computation of a controller such that the controlled system satis�esΦ (invariance, reachability, attractivity, ...)
DCS tool: Sigali (H. Marchand e.a.)
8/30
Discrete controller synthesis
Purpose
Constrain system behaviours to satisfy a property Φcompute a controller
Principle (on implicit equational representation)
State memoryTrans transition functionOut output function
Y c
Y u Trans State OutZY
Partition of variables : controllable (Y c), uncontrollable (Y u)
Computation of a controller such that the controlled system satis�esΦ (invariance, reachability, attractivity, ...)
DCS tool: Sigali (H. Marchand e.a.)
8/30
Discrete controller synthesis
Purpose
Constrain system behaviours to satisfy a property Φcompute a controller
Principle (on implicit equational representation)
State memoryTrans transition functionOut output function
Ctrlr Y c
Y u Trans State OutZY
Partition of variables : controllable (Y c), uncontrollable (Y u)
Computation of a controller such that the controlled system satis�esΦ (invariance, reachability, attractivity, ...)
DCS tool: Sigali (H. Marchand e.a.)
8/30
BZR programming language [http://bzr.inria.fr]
Built on top of nodes in Heptagon
to each contract, associate controllable variables, local to thecomponent
at compile-time (user-friendly DCS),Compute a controller for each component
When no controllable inputs : veri�cation by model-checking
Example (cont'd)
Idle Wait
e r and c/s
delayable(r,c,e) = a,s
Activec/s
r and not ca = false
a = true
a = false
twotasks(r1, e1, r2, e2) = a1, s1, a2, s2assume true
enforce not (a1 and a2)with c1, c2
(a1, s1) = delayable(r1, c1, e1);
(a2, s2) = delayable(r2, c2, e2)
9/30
Modular DCS with Heptagon/BZR
Use of contracts of each subnode
f (x1, . . . , xn) = (y1, . . . , yp)assume eAenforce eG
with c1, . . . , cq
f1(x11, . . . , x1n, c1, . . . , cq) = (y11, . . . , y1p)assume eA1
enforce eG 1· · ·
fp(xp1, . . . , xpn, c1, . . . , cq) = (yp1, . . . , ypp)
assume eApenforce eGp
Synthesis objective:
∀�(
(eA1 ⇒ eG1) ∧ . . . ∧ (eAp ⇒ eGp) ∧ eA)⇒
(eG ∧ eA1 ∧ . . . ∧ eAp
)Assuming eA and each subnodei enforces (eAi ⇒ eGi )
Compute controller enforcing eG and each eAiContracts of subnodes abstract their body
Modular code generation
10/30
Outline
1 Need for coordination in Self-managing systems
2 Approach
3 Modular coordination of autonomic managersNeed for modular design of the coordination controlModular Coordination: principle
4 Application
11/30
Monolithic Coordination: principle
Mm a
Mm a
csexhibit
controllability
M1m1 a1 M2m2 a2
ctrlrs1
c1 s2
c2
monolithicDCS
Scalability ?
State-space explosion problem
Re-usability ?
Contracts in the main node � recompilation when modi�cation
12/30
Modular Coordination: principle
Mi
ctrlr
mi ai
si ci
Mi
ctrlr
mi ai
si ci
ccsc
de�necontrollability
Mi
ctrlr1
mi ai
si ci
M ′i
ctrlr2
m′i a′i
s ′i c ′i
ctrlrsc
cc s ′c
c ′c
modular DCS
Scalable: contracts decomposition � break down complexity
Re-usable: not recompiled13/30
Outline
1 Need for coordination in Self-managing systems
2 Approach
3 Modular coordination of autonomic managers
4 ApplicationMulti-tier systems in a Datacenter
Self-sizing, self-repair and server consolidation manager
Coordination Problem
Coordination policy
Modelling the managers behavioursCoordinating the managers
14/30
Multi-tier systems in a Datacenter
Datacenter
Provide virtualized execution platform
Supported by several physical serversShared among multiple applications (e.g., Multi-tier JEE App)Virtual machines host application servers
Multi-tier JEE application
Apache Server Tomcat Server M−Proxy Server MySQL Server
app. host
Tomcat Server
Tomcat Server
tom1. host
tomM. host
tom2. host mproxy. host
MySQL Server
MySQL Server
mysql1. host
mysql2. host
mysqlN. host
Replicas Replicasm n
..................
Workload (W)
Workload (W’)
W / m
W / m
W / m
W’ / m
W’ / m
W’ / m
W’ = SQL Query
reading
Inter-connected tiers
Apache dispatches incoming requests to TomcatsTomcats access Database by Mysql-proxyMysql-proxy dispatches SQL queries reading toMySQLs
Replicated tiers
Tomcats and MySQLs
15/30
Autonomic managers
Self-Sizing � Management of the Degree of replication
Optimize resources usage and improve performance
Monitors cpu load of servers HostsSizes up when overloadSizes down when underload
Self-Repair � Management of service availability
Improve service availability
Monitors servers hosts HeartBeatRepairs when failure (fail-stop)
Server Consolidation �
Preserve e�cient usage of servers and applications performance
Resize the number of active servers based on the workload
16/30
Coordination Problem
Server failure leading to Workload misinterpretation
Failure in a replicated tier ⇒ temporary overload
Tomcats and MySQLs
Failure in a tier ⇒ temporary underload in its replicated successors
Apache ⇒ Tomcats and MySQLsTomcat ⇒ MySQLsMysql-Proxy ⇒ and MySQLs
Con�icting management actions
Consolidation decrease execution ⇒ size-up (repair) failure
no enough resource
Size-down ⇒ invalidate consolidation increase plan
additional resource not needed
Size-up (repair) ⇒ invalidate consolidation decrease plan
resource requested
17/30
Coordination policy
Multi-tier level
1 Within a replicated tier, avoid size-up when repairing.
2 Within a load-balanced replicated tier, avoid size-down when repairingthe load-balancer.
3 In multi-tiers, more generally, avoid size-down in a successor replicatedtier when repairing in a predecessor.
Datacenter level
1 When consolidating, avoid sizing or repairing.
2 When repairing or sizing up, delay consolidation decreasing
3 When sizing down, delay consolidation increasing
18/30
Modular control model : managers
Self-sizing control
UpDown Adding
add
remu /
Falseadding = adding = True
andcrm
ca and o /
(add, rem, adding) = self−sizing (ca, crm, o, u, na)
na /
States:
UpDown: awaitsAdding: adding
inputs:
o: overloadu: underloadna: adding completed
actions:
add: triggers addingrem: triggers removal
controllables:
ca: control addingcrm: control removal
Self-repair control
Wait Repair
repairing = Truerepairing = Falsenr /
fail / andnrfail / andcr rep
rep
(rep, repairing) = self−repair (cr, fail, nr) States:
Wait: awaits failureRepair: repairing
inputs:
fail: failurenr: repair completed
actions:
rep: triggers repair
controllables:
cr: control repair
19/30
Modular control model : managers
Consolidation control
consolidation(ci,cd,i,d,e)
Incr = falseDecr = falseIncr = false
Decr = falseIncr = false
Idle WaitDWaitI
DI
d and not cdi and not ci
ci / si cd / sdi and ci
/ sid and cd
/ sd
eeDecr = falseIncr = true
Decr = trueIncr = false
si,sd,Incr,Decr =
Decr = false
States:
Idle: awaitsWaitI: awaits authorisation to increaseI: increasing (Incr)WaitD: awaits authorisation to decreaseD: decreasing (Decr)
inputs:
i: increase noti�cationd: decrease noti�catione: completion noti�cation
actions:
si: triggers increasesd: triggers decrease
controllables:
ci: control increasecd: control decrease
20/30
Coordination objectives
Multi-tier
1 Replicated tier: not (repairing and add)
2 Load-balanced replicated tier: not (repairingL and rem)
3 In multi-tiers: not (repairingpred and remsucc)
Datacenter
1 not ((Incr or Decr) and (repairing* or adding* or rem*))
2 not ((repairing* or adding*) and sd)
3 not (rem* and si)
21/30
Exploiting models with DCS : monolithically
Repair
Apache
Repair Sizing
Tomcat repl. tier
Repair
Proxy
Repair Sizing
MySQL repl. tier
... Conso
DC. conso
Ctrlr
(. . .) = Main_node (. . . )enforce all contracts
with all controllable variables
(rep1, repairing1) = self-repair (c ′1, fail1, nr1);. . .(repN , repairingN) = self-repair (c ′
N, failN , nrN);
(add1, rem1, adding1) = self-sizing (ca1, . . . );. . .(addM , remM , addingM) = self-sizing (caM , . . . );(si, sd, Incr, Decr) = consolidation (ci, cd, i, d, e);
22/30
Modular DCS: Modelling architecture
Repair
Apache
Repair Sizing
CtrlrM1
Tomcat repl. tier
CtrlrM2
Repair
Proxy
Repair Sizing
CtrlrM1
MySQL repl. tier
CtrlrM2
CtrlrM3
... Conso
DC. conso
CtrlrDC(1,2,3)
Beside local control objectives
Enforce outside control objectives
control triggering of actions: (¬c ′i ⇒ ¬ai )short action: (c ′
i or not ai )
long action: longActCtrl(c ′i , ai , si )
def=
((c ′i or not ai ) and ((not (false fby si ) and not ai ) ⇒ not si )
23/30
Exploiting models with DCS : modularly
Replicated tier node
(. . .) = coord_repl-tier (cr′, fail, nr, ca′, crm′, o, u, na)enforce ((not (repairing and add))
and longActCtrl(cr′, rep, repairing))and longActCtrl(ca′, add, adding)and (crm′ or not rem))
with cr , ca, crm
(rep, repairing) = self-repair (cr , fail, nr);(add, remove, adding) = self-sizing (ca, crm, o, u, na);
Replicated tier
1 Control of a self-sizing and a self-repair2 Enforcement
1 local control: not (repairing and add)2 outside control: cr', ca' and crm' with associated objectives
24/30
Exploiting models with DCS : modularly (ii)
Load-balanced replicated tier node
(. . .) = coord_lb-repl-tier (cL′, failL, nrL, c′, fail, nr, ca′, crm′, o, u, na)enforce (not (repairingL and rem))
and longActCtrl(cL′, repL, repairingL))and longActCtrl(c′, rep, repairing))and longActCtrl(ca′, add, adding)and (crm′ or not rem))
with cL, c, ca, crm
(repL, repairingL) = self-repair (cL, failL, nrL);(rep, repairing, add, rem, adding) = coord_repl-tier (c, fail, nr, ca,
crm, o, u, na);
Load balanced replicated tier
1 Control of a self-repair and a coord_repl-tier2 Enforcement
1 local control: not (repairingL and rem)2 outside control: cL', c', ca' and crm' with associated objectives
25/30
Exploiting models with DCS : modularly (iii)
Multi-tier application node
(. . .) = coord_appli (cL′1, failL1, nrL1, c′1, fail1, nr1, ca′1, crm′1, o1, u1, na1,cL′2, failL2, nrL2, c′2, fail2, nr2, ca′2, crm′2, o2, u2, na2)
enforce ((not ((repairingL1 or repairing1) and rem2)))and longActCtrl(cL′i , repLi , repairingLi ))and longActCtrl(c′i , repi , repairingi ))and longActCtrl(ca′i , addi , addingi ) and (crm′ i or not remi ))
i = 1, 2with cL1, c1, ca1, crm1, cL2, c2, ca2, crm2
(repL1, repairingL1, rep1, repairing1, add1, rem1, adding1)= coord_lb-repl-tier (cL1, failL1, nrL1, c1, fail1, nr1, ca1, crm1 , o1, u1, na1);
(repL2, repairingL2, rep2, repairing2, add2, rem2, adding2)= coord_lb-repl-tier (cL2, failL2, nrL2, c2, fail2, nr2, ca2 , crm2 , o2, u2, na2);
Multitier1 Control of two coord_Lb-repl-tier2 Enforcement
1 local control: not (repairingL and rem)2 outside control: cL′i , c
′i , ca
′i and crm′
i with associated objectives26/30
Exploiting models with DCS : modularly (iv)
Two-application data-center
(. . .) = two-data-center (. . .)enforce ( (not ((Incr or Decr) and (repairingij or addingij or remij )))
and (not ((repairingij or addingij ) and sd)and longActCtrl(cL′ ij , repLij , repairingLij ))and longActCtrl(c′ ij , repij , repairingij ))and longActCtrl(ca′ ij , addij , addingij ) and (crm′ ij or not remij ))
i = 1, 2; j = 1, 2with cL11, c11, . . . , crm22, ci , cd
(. . .) = coord_appli (cL11, c11, ca11, crm11, . . . , cL21, c21, ca21, crm21, . . .)(. . .) = coord_appli (cL12, c12, ca12, crm12, . . . , cL22, c22, ca22, crm22, . . .)(si, sd, Incr, Decr) = consolidation (ci , cd , i, d, e);
Two application & consolidation
1 Control of two coord-appli and a consolidation2 Enforcement
1 local control: not (repairingL and rem)2 outside control: cL′ij , c
′ij , ca
′ij and crm′
ij with associated objectives
27/30
Exploiting models with DCS : comparisons
monolithic vs modular
nb. Synthesis time Memory usage
app. monolithic modular monolithic modular
1 0s 5s - -
2 49s 11s - -
3 42m24s 24s 34.81MB -
4 > 2 days 1m22s >149,56MB -
5 - 4m30s - 20,37MB
6 - 13m24s - 53,31MB
7 - 25m57s - 77,50MB
8 - 50m36s - 115,59MB
9 - 2h11m - 236,59MB
10 - 9h4m - 479,15MB
28/30
Implementation and Evaluation
Uncoordinated execution
0 10 20 30 40 50 60 70 80 90
100
0 5 10 15 20 25 30 35 40 0
1
2
3
4
5
CP
U_
Av
g (
%)
Act
ive
To
mca
t -
Act
ive
My
sql
time (min)
Application 1
Tom CPU_AvgMySQL CPU_Avg
Tom active
MySQL activeAp failure
0 10 20 30 40 50 60 70 80 90
100
0 5 10 15 20 25 30 35 40 0
1
2
3
4
5
CP
U_
Av
g (
%)
Act
ive
To
mca
t -
Act
ive
My
sql
time (min)
Application 2
Tom CPU_AvgMySql CPU_Avg
Active Tom
Active MySqlTom failure
Coordinated execution
0 10 20 30 40 50 60 70 80 90
100
0 5 10 15 20 25 30 35 40 0
1
2
3
4
5
CP
U_
Av
g (
%)
Act
ive
To
mca
t -
Act
ive
My
sql
time (min)
Application 1
Tom CPU_AvgMySQL CPU_Avg
Tom active
MySQL activeAp failure
0 10 20 30 40 50 60 70 80 90
100
0 5 10 15 20 25 30 35 40 0
1
2
3
4
5
CP
U_
Av
g (
%)
Act
ive
To
mca
t -
Act
ive
My
sql
time (min)
Application 2
Tom CPU_AvgMySql CPU_Avg
Active Tom
Active MySql
29/30
Conclusion
major challenge : consistent, e�cient and �exible coexistence betweenautonomic managers in the same system
approach: synchronous programming and DCS
automatic generation of the controller for cooperation of multipleautonomic managers from high-level policy,correctness by construction of the generated controller
perspectives
large scale coordination : several managers and multi-tier architecturescontrol beyond mutual exclusion : sequential aspects
30/30
Conclusion
major challenge : consistent, e�cient and �exible coexistence betweenautonomic managers in the same system
approach: synchronous programming and DCS
automatic generation of the controller for cooperation of multipleautonomic managers from high-level policy,correctness by construction of the generated controller
perspectives
large scale coordination : several managers and multi-tier architecturescontrol beyond mutual exclusion : sequential aspects
30/30
Conclusion
major challenge : consistent, e�cient and �exible coexistence betweenautonomic managers in the same system
approach: synchronous programming and DCS
automatic generation of the controller for cooperation of multipleautonomic managers from high-level policy,correctness by construction of the generated controller
perspectives
large scale coordination : several managers and multi-tier architecturescontrol beyond mutual exclusion : sequential aspects
30/30