SLA Decomposition: Translating Service Level Objectives to System Level Thresholds Yuan Chen, Subu...
Transcript of SLA Decomposition: Translating Service Level Objectives to System Level Thresholds Yuan Chen, Subu...
SLA Decomposition: Translating Service Level Objectivesto System Level Thresholds
Yuan Chen, Subu Iyer, Xue Liu, Dejan Milojicic, Akhil Sahai
Enterprise Systems and Software LabHewlett Packard Labs
2
Introduction• Service Level Agreements (SLAs)
− service behavior guarantees, e.g. performance, availability, reliability, security
− penalties in case the guarantees are violated
• The ability to deliver according to pre-defined SLAs agreements is a key to success
• SLAs management− capturing the guarantees between a service provider and a
customer− meeting these service level agreements, by designing
systems/services accordingly− monitoring for violations of agreed SLAs − enforcing SLAs in case of violations
3
SLA ManagementSLA Specification
Physical resources
Virtual resources
Clients
Applications
SLA Negotiation
SLA MonitoringSLA Enforcement
Design
4
Design• Services/systems need to be designed to meet the agreed
SLAs − to ensure that the system/service behaves satisfactorily before
putting it in production
• Enterprise systems and services are comprised of multiple sub-components − each sub system or component potentially affects the overall
behavior− any high level goals specified for a service in SLA potentially
relates to low level system components
• Traditional designs usually involve domain experts − manual and ad-hoc − costly, time-consuming, and often inflexible
5
• Virtualized data center− on demanding computing− application share resources using virtual
technologies
• Scenario− a 3-tier (Apache-Tomcat-Mysql) application
− SLO: average response time < 10 secs− determine the percentage of CPU assigned to
each VMs to meet the SLO with reasonable CPU utilization
Scenario1
0
5
10
15
20
20 50 90 Percentage of CPU Assigned to Tomcat
Ave
rag
e R
esp
on
se T
ime
(sec
.)
SLO
Motivational Scenario
6
Problem Statement
SLA Decomposition: given high level Service Level Objectives (SLOs), translate the SLOs to low level system thresholds
The system thresholds are used to created an effective design to meet the SLOs
− determine resource allocation for each individual component − determine software configuration − SLOs monitoring and assessment
SLA Decomposition
Service Level Objectives (SLOs)
response time throughput workload
system metrics
applicationattributes
healthy ranges
low level system thresholds
7
Challenges• The decomposition problem requires domain experts to be involved,
which makes the process manual, complex, costly and time consuming
• Complex and dynamic behavior of multi-component applications− components interact with each other in a complex manner
− multi-thread/multi-server, various configurations, cache & optimization
− various workload
− different software architectures, e.g., 2- vs 3-tier, 3-tier Servlet vs 3-tier EJB
− different kinds of software components and performance behaviors, e.g., Microsoft IIS, Apache, JBoss, WebLogic, WebSphere; Oracle, MySQL Microsoft SQL server
• Impact of virtualization and application sharing, e.g., Xen, VMware− granular allocation of resources
− environments are dynamic
• Different kinds of SLOs, e.g., performance, availability, security, …
8
Goal
Develop a SLA decomposition approach for multi-component applications, which translates high level SLOs to the state of each component involved in providing the service
− Effective: ensures that the overall SLO goals are met reasonably well
− Automated: eliminates the involvement of domain experts− Extensible: applicable to commonly used multi-component
applications − Flexible: easily adapts to changes in SLOs, application
topology, software configuration and infrastructure
9
Outline
• Problem Statement and Challenges• Our Approach
− Overview− Analytical Model for Multi-tier Applications− Component Profile Creation− SLA Decomposition
• Validation• Related Work• Summary and Future Work
10
Our Approach• Combine performance model and component characterization to create decomposition model
− model the behavior of the service
− characterize the behavior of each component
− combine them to create decomposition model
• Given a service instance and SLOs, use the decomposition model to derive low level thresholds
• Create an effective design of the service to meet the SLOs based on low level thresholds
PerformanceModeling
PerformanceModeling
Component Profiling & Regression Analysis
Component Profiling & Regression Analysis
DecompositionDecomposition
SLOs
low level system thresholds
Resource Allocation
Configuration SLA monitoring Assessment
11
SLA Decomposition
• Decomposition− given SLOs, R < r, X > x, find the set of cpu, mem, … n_clients, n_threads,
s_cache
g1(f1(cpuhttp,memhttp,n_clients),f2(cpuapp,memapp,n_threads),f3(cpudb,memdb,s_cache)) < r
− objective function, e.g. minimize (cpuhttp+ cpuapp+ cpudb)
Profiling& Regression
Analysis
Applications
Component Profiles
µhttp = f1 (cpuhttp,memhttp, n_clients,) µapp = f2 (cpuapp,memapp,n_threads) µdb = f3(cpudb,memdb,s_cache)
Performance Modeling
Performance Model
R = g1 (µhttp, µapp, µdb, w)X = g2 (µhttp, µapp, µdb, w)
high level goalsR < r, X > x
system thresholds cpuhttp > υ1 memcpu > m1cpuapp > υ2, memapp > m2cpudb > υ2 memdb > m3
workload characteristics w
Resource Allocation
configuration parametersn_clients = c,n_threads = ns_cache = s
Configuration
component performanceRhttp < r1Rapp< r2Rdb < r3
SLA monitoring Assessment
2
)
)
http1 1 http
app2 app
db3 db
http1 http
app2 app
db3 db
R = g (f (cpu ,mem ,n_clients),
f (cpu ,mem ,n_threads),
f (cpu ,mem ,s_cache,w)
X = g (f (cpu ,mem ,n_clients),
f (cpu ,mem ,n_threads),
f (cpu ,mem ,s_cache,w)
Decomposition
12
Outline
• Problem Statement and Challenges• Our Approach
− Overview− Analytical Model for Multi-tier Applications− Component Profile Creation− SLA Decomposition
• Validation• Related Work• Summary and Future Work
13
Modeling Multi-Tier Applications
• Multi-tier architecture
• Closed multi-station queuing network− general multi-station queue G/G/K representing each tier and the
underlying server − arbitrary service time distribution and visit rate to each tier− capture multi-thread/multi-server structure and concurrency− handle realistic user session based interactions
− Si: mean service time− Vi: visit rate− Ki: number of stations− N: number of users− Z: think time
…
…
Users
N , Z, V0
Q1
K1 , S1 , V1
…
…
… …
…
Q2 QM
K2 , S2 , V2 KM , SM , VM
……
User
T1 T2 TM
. . ....
.
.
....
14
Approximate Model for Mean Value Analysis (MVA)• Analytical performance model
− (M, N, Z, S1,V1, KI , … SM,VM, KM) R, X
• A queue with m stations and service demand D at each station is replaced with two tandem queues− a single-station queue with service demand D/m− a pure delay center, with delay D×(m-1)/m
Users
N , Z, V0
Q1
V1, D1, DD1
Q2QM
……
…
V2, D2, DD2 VM, DM, DDM
15
Deriving Queuing Network Performance
• (M, N, Z, Ki, Si, Vi) R, X• Di = Si * (Vi / V0)• Mean Value Analysis (MVA)
• Input− N: number of users− Z: think time− M: number of tiers− Ki: number of stations at tier i (i = 1,…, M)− S: mean service time at tier i (i = 1,…, M)− Vi: mean request rate of tier i (i = 1,…, M)
• Output− R: average response time− X: throughput − Ri: response time of tier i (i = 1,…, M)− Qi: queue length of tier i (i = 1,…, M)
• Complexity O(MN)
delay resource( )
(1 ( 1) queueing resourcek
kk k
DR i
D Q i
1
( )( )
K
kk
iX i
R i
( ) ( )k kQ i X R i
Input: N, Z, M, Ki, Si, Vi (i = 1,.. M) Output: R, X //initialization R0 = Z; D0 = Z; Q0 = 0; for i = 1 to M { // Tandem approximations for each tier Qi = 0; Di = Si ( Vi / V0); qrDi = Di/Ki ; drDi = Di ( Ki-1)/Ki ; } //introduce N users one by one for i = 1 to N {
for j = 1 to M { Rj = qrDj (1 + Qj); // queueing resource RRj = drDj; //delay resource
} for j = 1 to M Qj = X Rj;
X =
)(1
0
m
j
ii RRRR
i
}
R = )(1
M
i
ii RRR
Modified MVA algorithm for multi-station queue
16
Component Profiling• Capture component performance characteristics
− S= f(CPU, MEM,,nConnections, CacheSize …)
− independent of other components
• Profiling− deploy the application on a testbed − change the resource allocation and configurations of each component− while profiling a component, configure other component at its maximum
capacity− apply certain workload and collect the performance and workload data− apply statistical analysis to derive the correlation between a
component’s performance and its resource assignments and configuration
− archive the result as the component’s profile
• Capture workload characteristics e.g., visit rate, think time• Challenges
− measurement methodology: accurate, practical, general− non-intrusive approach− appropriate statistical analysis techniques, e.g., regression analysis
17
Decomposition• Performance model of M-tier applications
• Profiles for each tier/component − Si = fi (CPUi, MEMi, nConnectionsi) i =1,…, M
• Workload characteristics− visit rate Vi , number of stations Ki ; think time Z
• Decomposition
− (M, Ki, Vi, Z, N, R, X) (CPU1, MEM1, nConnection1…. CPUM, MEMM,
nConnectionM)
− given a M-tier application with the SLOs of R < r, X > x, N users, find the set
of CPUi, MEMi, nConnectionsi satisfying
− e.g., optimization problem with objective function
1 m i 1 1, 1 M M M M 1 MR = g(N, Z, K , ...K , f (CPU , MEM nConnectons ), ..., f (CPU , MEM , nConnections ), V , ... V )
X = N / (R+ Z)
( , , , , , ), /( )i i iR g N Z M K S V X N R Z
1 m i 1 1, 1 M M M M 1 Mg(N, Z, K , ...K , f (CPU , MEM nConnectons ), ..., f (CPU , MEM , nConnections ), V , ... V ) < r
N / (R+ Z) > x
1
minM
i
i
CPU
18
Outline
• Problem Statement and Challenges• Our Approach
− Overview− Analytical Model for Multi-tier Applications− Component Profile Creation− Decomposition
• Validation− Performance Model Validation− Component Profile Creation− SLA Decomposition Validation
• Related Work• Summary and Future Work
19
Virtualized Data Center Testbed
• Setup− a cluster of HP Proliant servers with Xen virtual machines (VMs)− each of the server nodes has two processors, 4 GB of RAM, and 1G
Ethernet interfaces− each running Fedora 4, kernel 2.6.12, and Xen 3.0-teseting− TPC-W and RUBiS− VMs hosting different tiers on different server nodes
• Estimate component service time − TS1: when an idle thread is assigned or when a new thread is created
− TS2: when a thread is returned to the thread pool or destroyed
− T = TS2 – TS1,S = T – waiting-time
− fine grained, works well for both light load and overload conditions
• Estimate number of stations− Max clients for Apache, Max threads for Tomcat, − MySQL: average number of running threads
20
Performance Model Validation (1)
• Experiment setup− TPC-W, an industry standard e-
commerce application − Apache 2.0, Tomcat 5.5 and
MySQL 5.0− 10,000 items, 288,000 customers
in DB − exponential distribution with a
mean 3.5ms think time
• The model predicts the response time and throughput very accurately.
• The model works well even when the system load is high.
Response Time
0
1
2
3
4
5
6
7
8
1 5 10 50 100 150 200
Numer of Sessions
Re
sp
on
se
Tim
e (
se
cs
) Model
Measurement
Throughput
0
5
10
15
20
25
30
35
40
1 5 10 50 100 150 200Number of Sessions
Th
ruo
gh
pu
t (R
eq
s/s
ec
) Model
Measurement
21
Performance Model Validation (2)
• Experiment setup− RUBiS, an eBay like auction site
application− Apache 2, Tomcat 5.5, MySQL 5.0− 1,000,000 users and 1,000,000 items
in DB − exponential distribution with a mean
3.5s think time− same set of model parameters
profiled with 200 users
• Using the same set of model input parameters, the model still predicts the performance for different workloads
• The model works for different applications with different performance characteristics
Response Time
0
2
4
6
8
10
12
14
50 100 150 200 250 300Number of Sessions
Av
g. R
es
po
ne
Tim
e (
se
cs
) Model
Measurement
Throughput
0
5
10
15
20
25
30
50 100 150 200 250 300
Number of Sessions
Av
g. T
hro
ug
hp
ut
(re
qs
/s) Model
Measurement
22
Component Profiles Creation• Change CPU assignments to VMs
− management domain (dom0) uses one CPU and VMs use the other CPU− Simple Earliest Deadline First scheduling (SEDF) to set the CPU share− capped mode enforces a VM cannot use more than its share of the total CPU
• VMs hosting different tiers run on different servers.
• While profiling a component, fix the CPU assignment of other components at 100%
• Change the CPU assignment from 10% to 60% with an increase of 5% and collect the performance data
• Derive the component service time (workload independent) from the measurements
Tomcat Profile
0
1
2
3
4
5
6
7
8
10 15 20 25 30 35 40 45 50 55 60CPU Assignment (% of Total CPU)
Co
mp
on
ent
Ser
vice
Tim
e (s
ecs)
MySQL Profile
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
10 15 20 25 30 35 40 45 50 55 60CPU Assignment (% of Total CPU)
Co
mp
on
ent
Ser
vice
Tim
e (s
ecs)
23
Designing a 3-tier RUBiS
6.5
139.5
Tomcat Queue Length
Per-component performance
69%74%13 4.8330%15%15%System0Users =100Resp. < 5 sec Throughput > 10 reqs/sec
21%23%24.38.86180%90%90%System2
92%99%11.115.9340%20%20%System1
67%61%21.599.8975%30%45%System0
70%67%21.419.7965%25%40%optimalUsers =300Resp < 10 secThroughput > 20 reqs/sec
MySQLTomcatThroughput.
(reqs/sec)
Resp.(sec)
TotalMySQLTomcat
CPU UtilizationApplicationPerformance
CPU Assignment
Design
SLOs
6.5
139.5
Tomcat Queue Length
Per-component performance
69%74%13 4.8330%15%15%System0Users =100Resp. < 5 sec Throughput > 10 reqs/sec
21%23%24.38.86180%90%90%System2
92%99%11.115.9340%20%20%System1
67%61%21.599.8975%30%45%System0
70%67%21.419.7965%25%40%optimalUsers =300Resp < 10 secThroughput > 20 reqs/sec
MySQLTomcatThroughput.
(reqs/sec)
Resp.(sec)
TotalMySQLTomcat
CPU UtilizationApplicationPerformance
CPU Assignment
Design
SLOs
design SLA monitoring and assessment
24
Designing a 2-tier RUBiS
• Meets the SLOs and optimizes the resource usage
• Applicable to multi-tier applications with different SLOs, different software architectures and different performance characteristics
SLOsDesign
CPU Assignment Performance CPU Utilization
Apache MySQL Total Resp(secs.)
Throughput(reqs/sec)
Apache MySQL
Users =100Resp < 5 sec Throughput > 10 reqs/sec
System0 10% 15% 25% 4.83 13 72% 53%
Users=500Resp < 10 sec.Throughput > 40reqs/sec
System0 35% 30% 65% 8.2 42 76% 61%
25
Outline
• Problem Statement and Challenges• Our Approach
− Overview− Analytical Model for Multi-tier Applications− Component Profile Creation− Decomposition
• Validation• Related Work• Summary and Future Work
26
Related Work
• Using Queuing theory models for provisioning− C. Stewart, and K. Shen, NSDI 2005 − B. Urgaonkar, et. al., ICAC 2005
− A. Zhang, P. Santos, D. Beyer, and H. Tang, HPL-2002-301
• Performance models for multi-tier applications− B. Urgaonkar, et. al., SIGMETRICS 2005
− T. Kelley, WORLDS 2005
− U. Herzog and J. Rolia, layered queueing model
• Classification-based decomposition− Y. Udupi, A. Sahai and S. Singhal, IM 2007
• ACTS: Automated Control and Tuning of Systems
27
Summary• Proposed a systematic approach to combine performance modeling and
component profiling to derive low level system thresholds from performance oriented SLOs
− create an effective design (e.g., resource selection and allocation, software configuration) to ensure SLAs
− SLA monitoring and assessment
• Presented an effective analytical performance model for multi-tier applications
− accurately predict the performance
− work well for applications with different software architectures, workloads and performance characteristics
• Validated the proposed approach for multi-tier applications in virtualized environment
− design the system to meet the given SLOs with reasonable resource usage
− work for common multi-tier applications with different SLOs goals and software architectures
− easy to adapt to changes in applications and environments
28
Open Issues• Extensions to other parameters like memory and configuration
parameters − “nice” regression function?
• Non-stationary workload− multi-class queueing network, layering queueing model− combine regression model and queueing model
• Profiling and measurement− tools and technologies from Mercury Interactive− non-intrusive approach: derive model parameters via regression model
• Long running transactions − e.g., HPC applications, complex composed service
• Non-performance based SLOs− e.g., availability goals − tradeoff analysis
• Complex and large scale systems− advanced constraint solving and optimization algorithms required
29
Future Work• Extend profiling to other parameters
− other system resources in addition to CPU resources, e.g., memory, I/O, cache
− software configuration parameters
− apply regression analysis on the profiling results
− general and practical measurement methodology
• Apply the approach to realistic applications and workloads− non-stationary workload: multi-class queueing network model
− non-intrusive profiling and measurement
− enterprise applications, HPC applications, and composed services
− non-performance SLOs, e.g., availability
− non-traditional SLOs, e.g., represented as utility functions
• Use advanced constraint solving and optimization algorithms for complex and large scale problems
• Integrate SLA decomposition into SLA life-cycle management− integrate with tradeoff analysis, SLA monitoring and SLA assessment
30
• SLA Decomposition: Translating Service Level Objectives SLOs) to Low-level System Thresholds. Yuan Chen, Subu Iyer, Xue Liu, Dejan Molojicic, and Akhil Sahai. To appear in Proceedings of the 4th IEEE International Conference on Autonomic Computing (ICAC 2007), June 2007.
• HP Technical Report: http://www.hpl.hp.com/techreports/2007/HPL-2007-17.html
Papers
31
Thank you!