Autonomic Computing: a new design principle for complex...

Post on 30-Jul-2020

1 views 0 download

Transcript of Autonomic Computing: a new design principle for complex...

© 2014 D.A. Menasce. All Rights Reserved.

1

Autonomic Computing:

a new design principle for complex systems

Danny Menascé Department of Computer Science

George Mason University www.cs.gmu.edu/faculty/menasce.html

© 2014 D.A. Menasce. All Rights Reserved.

2

© 2014 D.A. Menasce. All Rights Reserved.

3

4

Layered Software Architecture

© 2014 D.A. Menasce. All Rights Reserved.

© 2014 D.A. Menasce. All Rights Reserved.

5

#@*%#!

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-free •  Machine learning (e.g., reinforcement learning) •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks 6

© 2014 D.A. Menasce. All Rights Reserved.

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-free •  Machine learning (e.g., reinforcement learning) •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks 7

© 2014 D.A. Menasce. All Rights Reserved.

8

Motivation for AC •  “…main obstacle to further progress in

IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,

configure, tune, and maintain. – Need to integrate many heterogeneous

systems – Limit of human capacity being achieved

© 2014 D.A. Menasce. All Rights Reserved.

9

Motivation for AC •  “…main obstacle to further progress in

IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,

configure, tune, and maintain. – Need to integrate many heterogeneous

systems – Limit of human capacity being achieved

© 2014 D.A. Menasce. All Rights Reserved.

10

Motivation for AC •  “…main obstacle to further progress in

IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,

configure, tune, and maintain. – Need to integrate many heterogeneous

systems – Limit of human capacity being achieved

© 2014 D.A. Menasce. All Rights Reserved.

11

Motivation for AC •  “…main obstacle to further progress in

IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,

configure, tune, and maintain. – Need to integrate many heterogeneous

systems – Limit of human capacity being achieved

© 2014 D.A. Menasce. All Rights Reserved.

12

Motivation for AC (cont’d) •  Harder to anticipate interactions

between components at design time: – Need to defer decisions to run time

•  Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals

•  The workload and environment conditions tend to change very rapidly with time

© 2014 D.A. Menasce. All Rights Reserved.

13

Motivation for AC (cont’d) •  Harder to anticipate interactions

between components at design time: – Need to defer decisions to run time

•  Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals

•  The workload and environment conditions tend to change very rapidly with time

© 2014 D.A. Menasce. All Rights Reserved.

14

Motivation for AC (cont’d) •  Harder to anticipate interactions

between components at design time: – Need to defer decisions to run time

•  Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals

•  The workload and environment conditions tend to change very rapidly with time

© 2014 D.A. Menasce. All Rights Reserved.

15

3600 sec

60 sec

1 sec

Multi-scale time workload variation of a Web Server

© 2014 D.A. Menasce. All Rights Reserved.

16

Large Number of Configurations •  Complex middleware and database systems have a

very large number of configurable parameters.

Web Server (IIS 5.0) Application Server

(Tomcat 4.1) Database Server

(SQL Server 7.0) HTTP KeepAlive acceptCount Cursor Threshold Application Protection Level minProcessors Fill Factor Connection Timeout maxProcessors Locks Number of Connections Max Worker Threads Logging Location Min Memory Per Query Resource Indexing Network Packet Size Performance Tuning Level Priority Boost Application Optimization Recovery Interval MemCacheSize Set Working Set Size MaxCachedFileSize Max Server Memory ListenBacklog Min Server Memory MaxPoolThreads User Connections worker.ajp13.cachesize

© 2014 D.A. Menasce. All Rights Reserved.

17

Autonomic Computing •  Systems that can manage themselves given high-level

objectives expressed in term of service-level objectives or utility functions.

–  Average response time < 1.0 sec –  Response time of 95% of transactions ≤ 0.5 sec –  Search engine throughput ≥ 4600 queries/sec –  Availability of the e-mail portal ≥ 99.978%. –  Percentage of phishing e-mails filtered by the e-mail portal ≥ 90%

© 2014 D.A. Menasce. All Rights Reserved.

18

Autonomic Computing •  Autonomic computing: inspired in the human

autonomic nervous system:

–  Sensory and motor neurons that run between the central nervous system and various internal organs.

–  Monitors conditions in the internal environment and effects

changes in them.

•  E.g., contraction of both smooth and cardiac muscles is controlled by motor neurons of the autonomic system.

–  Functions in an involuntary and reflexive manner.

© 2014 D.A. Menasce. All Rights Reserved.

19

Autonomic Computing

© 2014 D.A. Menasce. All Rights Reserved.

20

Autonomic Systems

•  Self-managing – Self-configuring – Self-optimizing – Self-healing – Self-protecting

•  Self-* systems

© 2014 D.A. Menasce. All Rights Reserved.

21

Autonomic Systems

•  Self-managing – Self-configuring – Self-optimizing – Self-healing – Self-protecting

•  Self-* systems

© 2014 D.A. Menasce. All Rights Reserved.

IBM’s MAPE-K Model for AC

22

Managed Element

Monitor

Analyze Plan

Execute Knowledge

AUTONOMIC MANAGER

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

23

System to be controlled

AUTONOMIC

CONTROLLER

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

24

System to be controlled

AUTONOMIC

CONTROLLER

How does the AC know the output of the system for a given combination of the knobs?

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

25

How does the AC know the output of the system for a given combination of the knobs?

Sout = f (k1,k2,...,kn,Sinput )

The function f can be obtained by a model or can be learned by the AC controller by observing system inputs and outputs.

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

26

System to be controlled

AUTONOMIC

CONTROLLER

What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

27

What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?

•  The AC may want to maximize/minimize a performance metric:

•  Minimize response time •  Maximize availability •  Maximize throughput •  Minimize energy consumption

© 2014 D.A. Menasce. All Rights Reserved.

Autonomic Controller

28

What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?

Minimize ResponseTime = f (k1, …, kn) Subject to EnergyConsumed = g1 (k1, …, kn) ≤ MaxEnergy Throughput = g2 (k1, …, kn) ≥ MinThroughput Availability = g3 (k1, …, kn) ≥ MinAvailability

© 2014 D.A. Menasce. All Rights Reserved.

Utility Functions and the AC

29

What is the objective of the AC when determining a new set of knobs (i.e., configuration) for the system?

•  The AC may want to consider trade-offs between performance metrics. •  Use utility function.

•  A utility function of an attribute a indicates the usefulness of a system as a function of the value of the attribute a.

© 2014 D.A. Menasce. All Rights Reserved.

30

Utility Function as a Function of Response Time

0

10

20

30

40

50

60

70

80

90

100

0 2 4 6 8 10 12 14

Response Time (sec)

Util

ity

U =K × e−R +β

1+ e−R+β

Sigmoid function

normalizing constant

attribute SLO

© 2014 D.A. Menasce. All Rights Reserved.

31

Utility Function as a Function of Throughput

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7

Throughput (tps)

Util

ity

U = K × 11+ e−X +β

−1

1+ eβ#

$%

&

'(

Sigmoid function

© 2014 D.A. Menasce. All Rights Reserved.

Utility Functions and the AC

32

What if there is more than one relevant attribute?

•  Specify a global utility function that is a function of the utility functions of each attribute:

Uglobal = f (U1(a1),...,Un (an ))e.g.,

Uglobal = wrUr (R) + wxUx (X) + waUa (a)wr + wx + wa =1

© 2014 D.A. Menasce. All Rights Reserved.

33

Performance Model-Based Autonomic Computing

state

© 2014 D.A. Menasce. All Rights Reserved.

34

Performance Model-Based Autonomic Computing

state State: e.g., set of configuration parameters Value: e.g., QoS metric, utility function value Goal: find state that optimizes the value subject to constraints

•  State space is typically large •  Objective function does not have a closed form Use performance models to compute value at each state. Use combinatorial search techniques to find near- optimal solution.

© 2014 D.A. Menasce. All Rights Reserved.

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-less •  Machine learning •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks

© 2014 D.A. Menasce. All Rights Reserved.

35

36

Dynamic Resource Allocation in Internet Data Centers

Application Environment

1

Application Environment

2

Application Environment

M

Server 1 Server 2 Server N

. . .

. . .

© 2014 D.A. Menasce. All Rights Reserved.

37

Dynamic Resource Allocation Problem

Application Environment

1

Application Environment

2

Application Environment

M

Server 1 Server 2 Server N

. . .

. . .

© 2014 D.A. Menasce. All Rights Reserved.

38

Dynamic Resource Allocation Two-level Controllers

LocalController

LocalController

Server

Server

Server

Server

Server

Server

GlobalController

ApplicationEnvironment 1

ApplicationEnvironment M

Decides how many servers to assign to each AE.

Implements Global Controller’s decisions.

© 2014 D.A. Menasce. All Rights Reserved.

39

Dynamic Resource Allocation Utility Function

•  The global controller uses a global utility function, Ug, to assess the adherence of the overall data center performance to desired service levels objectives (SLOs)

1

10

),...,(

1,

,

1,,

1

=

<<

×=

=

=

=

i

i

S

ssi

si

S

ssisii

Mg

a

a

UaU

UUhU

© 2014 D.A. Menasce. All Rights Reserved.

40

Workload Variation for Online AEs

0

20

40

60

80

100

120

140

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Control Interval

Arr

ival

Rat

e (t

ps)

Lambda1,1 Lambda1,2 Lambda1,3 Lambda2,1 Lambda2,2

© 2014 D.A. Menasce. All Rights Reserved.

41

Response Times for Class 1 of AE 1

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Control Interval

Res

p. T

ime

(sec

)

R1,1 (no controller) R1,1 (controller)

© 2014 D.A. Menasce. All Rights Reserved.

42

Variation of the Number of Servers

0

2

4

6

8

10

12

14

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Control Interval

No.

Ser

vers

N1 (no controller) N1 (controller) N2 (no controller)N2 (controller) N3 (no controller) N3 (controller)

© 2014 D.A. Menasce. All Rights Reserved.

43

Variation of Global Utility

90

91

92

93

94

95

96

97

98

99

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Control Interval

Ug

No Controller With Controller

© 2014 D.A. Menasce. All Rights Reserved.

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-less •  Machine learning •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks

© 2014 D.A. Menasce. All Rights Reserved.

44

45

CPU Allocation Problem for Autonomic Virtualized Environments

•  Existing systems allow for manual allocation of CPU resources to VMs using CPU priorities or CPU shares.

•  Need automated mechanism for the adjustment of CPU shares of the virtual machines in order to maximize the global utility of the entire virtualized environment

© 2014 D.A. Menasce. All Rights Reserved.

46

CPU Allocation Problem for Autonomic Virtualized Environments (Cont’d)

Workload 1 Workload 2 …. Workload n

Virtual Machine 1 Virtual Machine 2 . . . Virtual Machine M

Virtualized Environment

U1,1

Un,1 U1

U2

UM

Ug

© 2014 D.A. Menasce. All Rights Reserved.

47

Virtualization Controller Architecture CPU

disk 1

disk k

. . . Performance

Predictor

Autonomic Controller Algorithm

Performance Monitor

Utility Function Computation

workload SLAs to the VMM resource manager

© 2014 D.A. Menasce. All Rights Reserved.

48

CPU Shares Based Allocation: Workload Variation

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Control Interval

Arriv

al R

ate

(tps)

Lambda1 Lambda2

© 2014 D.A. Menasce. All Rights Reserved.

Workload 1 Workload 2

49

CPU Shares Based Allocation: CPU Shares Variation

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Control Interval

CPU_

SHAR

E (in

%)

VM1 CPU_SHARE (controller) VM2 CPU_SHARE (controller)

© 2014 D.A. Menasce. All Rights Reserved.

Workload 1

Workload 2

50

CPU Shares Based Allocation: Response Time for VM1

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Control Interval

VM1 R

esp.

Time (

sec)

R1 (no controller) R1 (controller) SLA

no controller

with controller

© 2014 D.A. Menasce. All Rights Reserved.

51

CPU Shares Based Allocation: Global Utility

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Control Interval

Virtu

al En

viron

men

t Glo

bal U

tility

Ug (no controller) Ug (controller)

with controller

no controller

© 2014 D.A. Menasce. All Rights Reserved.

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-less •  Machine learning •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks

© 2014 D.A. Menasce. All Rights Reserved.

52

Outline •  Motivation for Autonomic Computing •  Techniques used in AC

–  Model-driven •  Performance model •  Control theory

–  Model-free •  Machine learning (e.g., reinforcement learning) •  Statistical learning

•  Applications –  Internet data centers, virtual machine management, e-

commerce and Web-systems, service oriented computing, cloud computing resource management, databases, adaptive software systems, and emergency departments

•  Concluding Remarks 53

© 2014 D.A. Menasce. All Rights Reserved.

Self-Architecting SOA Software Systems (SASSY)

•  SASSY allows domain experts to specify requirements using a high-level visual activity language.

•  SASSY generates the initial software architecture optimized to maximize a utility function.

•  The running system is constantly monitored and SASSY automatically re-architects the system when needed.

54

© 2014 D.A. Menasce. All Rights Reserved.

Specify SASs and

SSSs

Generation of Base

Architecture

Self- Architecting

Service Binding and Coordination

Logic Deployment

Service Activity Schema (SAS) + SSSs

Base Architecture

Near-optimal Architecture

Domain Experts

Software Engineers

Develop and

Register Services

Service Directory

Develop QoS

Architectural Patterns

QoS Pattern Library

Develop Software

Adaptation Patterns

Adaptation Pattern Library

Running system

Service Discovery

Service Coordination

© 2014 D.A. Menasce. All Rights Reserved.

55

Architecture of running system

Running system

Monitor Running System

Analyze and Determine Need to

Re-Architect

Plan for Re-architecting

Execute Software

Adaptation Control

model r/w model read communication

human- computer interaction

KEY

SASSY: Run-time Adaptation

© 2014 D.A. Menasce. All Rights Reserved.

56

Architecture Adaptation Search Trajectories

© 2014 D.A. Menasce. All Rights Reserved.

Each evaluation consists of a software architecture and a set of service providers for the components of the architecture.

57

Concluding Remarks •  Autonomic computing is a key design

discipline for large, complex, and dynamic computer systems

•  AC uses models (queuing and/or control models) and optimization techniques (exact and/or approximate).

•  AC may learn models of the controlled system.

•  Utility functions useful in dealing with tradeoffs.

58

© 2014 D.A. Menasce. All Rights Reserved.

•  Arwa Aldhalaan •  Serene Al-Momen •  Mahmoud Awad •  Firas Alomari •  Daniel Barbará •  Shouvik Bardhan •  Mohamed Bennani •  Alex Brodsky •  Ernesto Casalicchio •  Ronald Dodge •  Vinod Dubey •  Naeem Esfahani •  John Ewing •  Hassan Gomaa •  Gus Jabbour •  Mohan Krishnamoorthy •  Sam Malek •  Faisal Sibai •  João P. Sousa

© 2014 D.A. Menasce. All Rights Reserved.

59

•  Emergency Departments •  Smart Manufacturing •  Energy Management •  Dynamic allocation of services in

SOA architectures •  Cluster support for Big Data •  Load-balancing policies •  Computer networks •  Cloud computing •  Databases

CREDITS NOT COVERED TODAY