Autonomic Computing: a new design principle for complex...
Transcript of Autonomic Computing: a new design principle for complex...
© 2014 D.A. Menasce. All Rights Reserved.
1
Autonomic Computing:
a new design principle for complex systems
Danny Menascé Department of Computer Science
George Mason University www.cs.gmu.edu/faculty/menasce.html
© 2014 D.A. Menasce. All Rights Reserved.
2
© 2014 D.A. Menasce. All Rights Reserved.
3
4
Layered Software Architecture
© 2014 D.A. Menasce. All Rights Reserved.
© 2014 D.A. Menasce. All Rights Reserved.
5
#@*%#!
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-free • Machine learning (e.g., reinforcement learning) • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks 6
© 2014 D.A. Menasce. All Rights Reserved.
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-free • Machine learning (e.g., reinforcement learning) • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks 7
© 2014 D.A. Menasce. All Rights Reserved.
8
Motivation for AC • “…main obstacle to further progress in
IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,
configure, tune, and maintain. – Need to integrate many heterogeneous
systems – Limit of human capacity being achieved
© 2014 D.A. Menasce. All Rights Reserved.
9
Motivation for AC • “…main obstacle to further progress in
IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,
configure, tune, and maintain. – Need to integrate many heterogeneous
systems – Limit of human capacity being achieved
© 2014 D.A. Menasce. All Rights Reserved.
10
Motivation for AC • “…main obstacle to further progress in
IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,
configure, tune, and maintain. – Need to integrate many heterogeneous
systems – Limit of human capacity being achieved
© 2014 D.A. Menasce. All Rights Reserved.
11
Motivation for AC • “…main obstacle to further progress in
IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,
configure, tune, and maintain. – Need to integrate many heterogeneous
systems – Limit of human capacity being achieved
© 2014 D.A. Menasce. All Rights Reserved.
12
Motivation for AC (cont’d) • Harder to anticipate interactions
between components at design time: – Need to defer decisions to run time
• Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals
• The workload and environment conditions tend to change very rapidly with time
© 2014 D.A. Menasce. All Rights Reserved.
13
Motivation for AC (cont’d) • Harder to anticipate interactions
between components at design time: – Need to defer decisions to run time
• Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals
• The workload and environment conditions tend to change very rapidly with time
© 2014 D.A. Menasce. All Rights Reserved.
14
Motivation for AC (cont’d) • Harder to anticipate interactions
between components at design time: – Need to defer decisions to run time
• Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals
• The workload and environment conditions tend to change very rapidly with time
© 2014 D.A. Menasce. All Rights Reserved.
15
3600 sec
60 sec
1 sec
Multi-scale time workload variation of a Web Server
© 2014 D.A. Menasce. All Rights Reserved.
16
Large Number of Configurations • Complex middleware and database systems have a
very large number of configurable parameters.
Web Server (IIS 5.0) Application Server
(Tomcat 4.1) Database Server
(SQL Server 7.0) HTTP KeepAlive acceptCount Cursor Threshold Application Protection Level minProcessors Fill Factor Connection Timeout maxProcessors Locks Number of Connections Max Worker Threads Logging Location Min Memory Per Query Resource Indexing Network Packet Size Performance Tuning Level Priority Boost Application Optimization Recovery Interval MemCacheSize Set Working Set Size MaxCachedFileSize Max Server Memory ListenBacklog Min Server Memory MaxPoolThreads User Connections worker.ajp13.cachesize
© 2014 D.A. Menasce. All Rights Reserved.
17
Autonomic Computing • Systems that can manage themselves given high-level
objectives expressed in term of service-level objectives or utility functions.
– Average response time < 1.0 sec – Response time of 95% of transactions ≤ 0.5 sec – Search engine throughput ≥ 4600 queries/sec – Availability of the e-mail portal ≥ 99.978%. – Percentage of phishing e-mails filtered by the e-mail portal ≥ 90%
© 2014 D.A. Menasce. All Rights Reserved.
18
Autonomic Computing • Autonomic computing: inspired in the human
autonomic nervous system:
– Sensory and motor neurons that run between the central nervous system and various internal organs.
– Monitors conditions in the internal environment and effects
changes in them.
• E.g., contraction of both smooth and cardiac muscles is controlled by motor neurons of the autonomic system.
– Functions in an involuntary and reflexive manner.
© 2014 D.A. Menasce. All Rights Reserved.
19
Autonomic Computing
© 2014 D.A. Menasce. All Rights Reserved.
20
Autonomic Systems
• Self-managing – Self-configuring – Self-optimizing – Self-healing – Self-protecting
• Self-* systems
© 2014 D.A. Menasce. All Rights Reserved.
21
Autonomic Systems
• Self-managing – Self-configuring – Self-optimizing – Self-healing – Self-protecting
• Self-* systems
© 2014 D.A. Menasce. All Rights Reserved.
IBM’s MAPE-K Model for AC
22
Managed Element
Monitor
Analyze Plan
Execute Knowledge
AUTONOMIC MANAGER
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
23
System to be controlled
AUTONOMIC
CONTROLLER
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
24
System to be controlled
AUTONOMIC
CONTROLLER
How does the AC know the output of the system for a given combination of the knobs?
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
25
How does the AC know the output of the system for a given combination of the knobs?
€
Sout = f (k1,k2,...,kn,Sinput )
The function f can be obtained by a model or can be learned by the AC controller by observing system inputs and outputs.
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
26
System to be controlled
AUTONOMIC
CONTROLLER
What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
27
What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?
• The AC may want to maximize/minimize a performance metric:
• Minimize response time • Maximize availability • Maximize throughput • Minimize energy consumption
© 2014 D.A. Menasce. All Rights Reserved.
Autonomic Controller
28
What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?
Minimize ResponseTime = f (k1, …, kn) Subject to EnergyConsumed = g1 (k1, …, kn) ≤ MaxEnergy Throughput = g2 (k1, …, kn) ≥ MinThroughput Availability = g3 (k1, …, kn) ≥ MinAvailability
© 2014 D.A. Menasce. All Rights Reserved.
Utility Functions and the AC
29
What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?
• The AC may want to consider trade-offs between performance metrics. • Use utility function.
• A utility function of an attribute a indicates the usefulness of a system as a function of the value of the attribute a.
© 2014 D.A. Menasce. All Rights Reserved.
30
Utility Function as a Function of Response Time
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12 14
Response Time (sec)
Util
ity
U =K × e−R +β
1+ e−R+β
Sigmoid function
normalizing constant
attribute SLO
© 2014 D.A. Menasce. All Rights Reserved.
31
Utility Function as a Function of Throughput
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7
Throughput (tps)
Util
ity
U = K × 11+ e−X +β
−1
1+ eβ#
$%
&
'(
Sigmoid function
© 2014 D.A. Menasce. All Rights Reserved.
Utility Functions and the AC
32
What if there is more than one relevant attribute?
• Specify a global utility function that is a function of the utility functions of each attribute:
€
Uglobal = f (U1(a1),...,Un (an ))e.g.,
€
Uglobal = wrUr (R) + wxUx (X) + waUa (a)wr + wx + wa =1
© 2014 D.A. Menasce. All Rights Reserved.
33
Performance Model-Based Autonomic Computing
state
© 2014 D.A. Menasce. All Rights Reserved.
34
Performance Model-Based Autonomic Computing
state State: e.g., set of configuration parameters Value: e.g., QoS metric, utility function value Goal: find state that optimizes the value subject to constraints
• State space is typically large • Objective function does not have a closed form Use performance models to compute value at each state. Use combinatorial search techniques to find near- optimal solution.
© 2014 D.A. Menasce. All Rights Reserved.
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-less • Machine learning • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks
© 2014 D.A. Menasce. All Rights Reserved.
35
36
Dynamic Resource Allocation in Internet Data Centers
Application Environment
1
Application Environment
2
Application Environment
M
Server 1 Server 2 Server N
. . .
. . .
© 2014 D.A. Menasce. All Rights Reserved.
37
Dynamic Resource Allocation Problem
Application Environment
1
Application Environment
2
Application Environment
M
Server 1 Server 2 Server N
. . .
. . .
© 2014 D.A. Menasce. All Rights Reserved.
38
Dynamic Resource Allocation Two-level Controllers
LocalController
LocalController
Server
Server
Server
Server
Server
Server
GlobalController
ApplicationEnvironment 1
ApplicationEnvironment M
Decides how many servers to assign to each AE.
Implements Global Controller’s decisions.
© 2014 D.A. Menasce. All Rights Reserved.
39
Dynamic Resource Allocation Utility Function
• The global controller uses a global utility function, Ug, to assess the adherence of the overall data center performance to desired service levels objectives (SLOs)
1
10
),...,(
1,
,
1,,
1
=
<<
×=
=
∑
∑
=
=
i
i
S
ssi
si
S
ssisii
Mg
a
a
UaU
UUhU
© 2014 D.A. Menasce. All Rights Reserved.
40
Workload Variation for Online AEs
0
20
40
60
80
100
120
140
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Control Interval
Arr
ival
Rat
e (t
ps)
Lambda1,1 Lambda1,2 Lambda1,3 Lambda2,1 Lambda2,2
© 2014 D.A. Menasce. All Rights Reserved.
41
Response Times for Class 1 of AE 1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Control Interval
Res
p. T
ime
(sec
)
R1,1 (no controller) R1,1 (controller)
© 2014 D.A. Menasce. All Rights Reserved.
42
Variation of the Number of Servers
0
2
4
6
8
10
12
14
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Control Interval
No.
Ser
vers
N1 (no controller) N1 (controller) N2 (no controller)N2 (controller) N3 (no controller) N3 (controller)
© 2014 D.A. Menasce. All Rights Reserved.
43
Variation of Global Utility
90
91
92
93
94
95
96
97
98
99
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Control Interval
Ug
No Controller With Controller
© 2014 D.A. Menasce. All Rights Reserved.
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-less • Machine learning • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks
© 2014 D.A. Menasce. All Rights Reserved.
44
45
CPU Allocation Problem for Autonomic Virtualized Environments
• Existing systems allow for manual allocation of CPU resources to VMs using CPU priorities or CPU shares.
• Need automated mechanism for the adjustment of CPU shares of the virtual machines in order to maximize the global utility of the entire virtualized environment
© 2014 D.A. Menasce. All Rights Reserved.
46
CPU Allocation Problem for Autonomic Virtualized Environments (Cont’d)
Workload 1 Workload 2 …. Workload n
Virtual Machine 1 Virtual Machine 2 . . . Virtual Machine M
Virtualized Environment
U1,1
Un,1 U1
U2
UM
Ug
© 2014 D.A. Menasce. All Rights Reserved.
47
Virtualization Controller Architecture CPU
disk 1
disk k
. . . Performance
Predictor
Autonomic Controller Algorithm
Performance Monitor
Utility Function Computation
workload SLAs to the VMM resource manager
© 2014 D.A. Menasce. All Rights Reserved.
48
CPU Shares Based Allocation: Workload Variation
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Control Interval
Arriv
al R
ate
(tps)
Lambda1 Lambda2
© 2014 D.A. Menasce. All Rights Reserved.
Workload 1 Workload 2
49
CPU Shares Based Allocation: CPU Shares Variation
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Control Interval
CPU_
SHAR
E (in
%)
VM1 CPU_SHARE (controller) VM2 CPU_SHARE (controller)
© 2014 D.A. Menasce. All Rights Reserved.
Workload 1
Workload 2
50
CPU Shares Based Allocation: Response Time for VM1
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Control Interval
VM1 R
esp.
Time (
sec)
R1 (no controller) R1 (controller) SLA
no controller
with controller
© 2014 D.A. Menasce. All Rights Reserved.
51
CPU Shares Based Allocation: Global Utility
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Control Interval
Virtu
al En
viron
men
t Glo
bal U
tility
Ug (no controller) Ug (controller)
with controller
no controller
© 2014 D.A. Menasce. All Rights Reserved.
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-less • Machine learning • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks
© 2014 D.A. Menasce. All Rights Reserved.
52
Outline • Motivation for Autonomic Computing • Techniques used in AC
– Model-driven • Performance model • Control theory
– Model-free • Machine learning (e.g., reinforcement learning) • Statistical learning
• Applications – Internet data centers, virtual machine management, e-
commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments
• Concluding Remarks 53
© 2014 D.A. Menasce. All Rights Reserved.
Self-Architecting SOA Software Systems (SASSY)
• SASSY allows domain experts to specify requirements using a high-level visual activity language.
• SASSY generates the initial software architecture optimized to maximize a utility function.
• The running system is constantly monitored and SASSY automatically re-architects the system when needed.
54
© 2014 D.A. Menasce. All Rights Reserved.
Specify SASs and
SSSs
Generation of Base
Architecture
Self- Architecting
Service Binding and Coordination
Logic Deployment
Service Activity Schema (SAS) + SSSs
Base Architecture
Near-optimal Architecture
Domain Experts
Software Engineers
Develop and
Register Services
Service Directory
Develop QoS
Architectural Patterns
QoS Pattern Library
Develop Software
Adaptation Patterns
Adaptation Pattern Library
Running system
Service Discovery
Service Coordination
© 2014 D.A. Menasce. All Rights Reserved.
55
Architecture of running system
Running system
Monitor Running System
Analyze and Determine Need to
Re-Architect
Plan for Re-architecting
Execute Software
Adaptation Control
model r/w model read communication
human- computer interaction
KEY
SASSY: Run-time Adaptation
© 2014 D.A. Menasce. All Rights Reserved.
56
Architecture Adaptation Search Trajectories
© 2014 D.A. Menasce. All Rights Reserved.
Each evaluation consists of a software architecture and a set of service providers for the components of the architecture.
57
Concluding Remarks • Autonomic computing is a key design
discipline for large, complex, and dynamic computer systems
• AC uses models (queuing and/or control models) and optimization techniques (exact and/or approximate).
• AC may learn models of the controlled system.
• Utility functions useful in dealing with tradeoffs.
58
© 2014 D.A. Menasce. All Rights Reserved.
• Arwa Aldhalaan • Serene Al-Momen • Mahmoud Awad • Firas Alomari • Daniel Barbará • Shouvik Bardhan • Mohamed Bennani • Alex Brodsky • Ernesto Casalicchio • Ronald Dodge • Vinod Dubey • Naeem Esfahani • John Ewing • Hassan Gomaa • Gus Jabbour • Mohan Krishnamoorthy • Sam Malek • Faisal Sibai • João P. Sousa
© 2014 D.A. Menasce. All Rights Reserved.
59
• Emergency Departments • Smart Manufacturing • Energy Management • Dynamic allocation of services in
SOA architectures • Cluster support for Big Data • Load-balancing policies • Computer networks • Cloud computing • Databases
CREDITS NOT COVERED TODAY