Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers

Autonomic Provisioning of Backend Databases inDynamic Content Web ServersJin Chen

Department of Computer ScienceUniversity of Toronto

Toronto, CanadaEmail: [email protected]

Gokul Soundararajan, Cristiana AmzaDepartment of Electrical and Computer Engineering

University of TorontoToronto, Canada

Email: {gokul, amza}@eecg.toronto.edu

Abstract— In autonomic provisioning, a resource managerallocates resources to an application, on-demand, e.g., during loadspikes. Modelling-based approaches have proved very successfulfor provisioning the web and application server tiers in dynamiccontent servers. On the other hand, accurately modelling thebehavior of the back-end database server tier is a dauntingtask. Hence, automated provisioning of database replicas hasreceived comparatively less attention. This paper introduces anovel pro-active scheme based on the classic K-nearest-neighbors(KNN) machine learning approach for adding database replicasto application allocations in dynamic content web server clusters.Our KNN algorithm uses lightweight monitoring of essentialsystem and application metrics in order to decide how manydatabases it should allocate to a given workload. Our pro-activealgorithm also incorporates awareness of system stabilizationperiods after adaptation in order to improve prediction accuracyand avoid system oscillations. We compare this pro-active self-configuring scheme for scaling the database tier with a reactivescheme. Our experiments using the industry-standard TPC-We-commerce benchmark demonstrate that the pro-active schemeis effective in reducing both the frequency and peak level ofSLA violations compared to the reactive scheme. Furthermore, byaugmenting the pro-active approach with awareness and trackingof system stabilization periods induced by adaptation in ourreplicated system, we effectively avoid oscillations in resourceallocation.

I. I NTRODUCTION

Autonomic management of large-scale dynamic contentservers has recently received growing attention [1], [2], [3],[4] due to the excessive personnel costs involved in managingthese complex systems. This paper introduces a new pro-activeresource allocation technique for the database back-end ofdynamic content web sites.

Dynamic content servers commonly use a three-tier archi-tecture (see Figure 1) that consists of a front-end web servertier, an application server tier that implements the businesslogic, and a back-end database tier that stores the dynamiccontent of the site. Gross hardware over-provisioning foreach workload’s estimated peak load can become infeasiblein the short to medium term, even for large sites. Hence, itis important to efficiently utilize available resources throughdynamic resource allocation, i.e., on-demand provisioning forall active applications. One such approach, the Tivoli on-demand business solutions [3], implements dynamic provision-ing of resources within the stateless web server and application

server tiers. However, dynamic resource allocation amongapplications within the stateful database tier, which commonlybecomes the bottleneck [5], [6], has received comparativelyless attention.

��

��

��

��

��

��

��

Fig. 1. Architecture of Dynamic Content Sites

Recent work suggests that fully-transparent, tier-independent provisioning solutions can be used in complexsystems that contain persistent state such as the database tieras well [1], [4]. These solutions, similar to Tivoli, treat thesystem as a set of black boxes and simply add boxes to aworkload’s allocation based on queuing models [1], utilitymodels [4], [7] or marketplace approaches [8]. In contrast,our insight in this paper is that for a stateful system, suchas a database tier, off-line system training coupled withon-line system monitoring and tracking system stabilizationafter triggering an adaptation are key features for successfulprovisioning.

We build on our previous work [9] in the area of databaseprovisioning. As in our previous work, our goal is to keepthe average query latency for any particular workload undera predefined Service Level Agreement (SLA) value. Our pre-vious work achieves this goal through areactivesolution [9],where a new database replica is allocated to a dynamiccontent workload in response to load or failure-induced SLAviolations.

In this paper, we introduce a novel pro-active scheme thatdynamically adds database replicas in advance of predictedneed, while removing them in underload in order to optimizeresource usage. Our pro-active scheme is based on a classicmachine learning algorithm, K-nearest-neighbors (KNN), forpredicting resource allocation needs for workloads. We use anadaptive filter to track load variations, and the KNN classi-fier to build a performance model of database clusters. Thelearning phase of KNN uses essential system and applicationmetrics, such as, the average throughput, the average number

of active connections, the read to write query ratio, the CPU,I/O and memory usage system-level statistics. We train theperformance model on these metrics during a variety of stablesystem states using different client loads and different numbersof database replicas. Correspondingly, our pro-active dynamicresource allocation mechanism uses active monitoring of thesame database system and application metrics at run-time.Based on the predicted load information and the trained KNNclassifier, the resource manager adapts on-line and allocatesthe number of databases that the application needs in the nexttime slot under varying load situations.

While pro-active provisioning of database replicas is appeal-ing, it faces two inter-related challenges: 1) the unpredictabledelay of adding replicas and 2) the instability of the systemafter triggering an adaptation. Adding a new database replicais a time-consuming operation because the database state ofthe new replica may be stale and must be brought up-to-datevia data migration. In addition, the buffer cache at the replicaneeds to be warm before the replica can be used effectively.Thus, when adding a replica, the system metrics might showabnormal values during system stabilization e.g., due to theload imbalance between the old and newly added replicas. Weshow that a pro-active approach that disregards the neededperiod of system stabilization after adaptation induces systemoscillations between rapidly adding and removing replicas. Weincorporate awareness of system instability after adaptationinto our allocation logic in order to avoid such oscillations.Some form of system instability detection based on simpleon-line heuristics could be beneficial when incorporated evenin a reactive provisioning technique [9]. On the other hand,our pro-active technique can detect and characterize periodsof instability with high accuracy due to its system trainingapproach. During training on a variety of system parameters,the system learns their normal ranges and the normal correla-tions between their values, resulting in more robust instabilitydetection at run-time.

Our prototype implementation interposes an autonomicmanager tier between the application server(s) and thedatabase cluster. The autonomic manager tier consists of anautonomic manager component collaborating with a set ofschedulers (one per application). Each scheduler is respon-sible for virtualizing the database cluster and for distributingthe corresponding application’s requests across the databaseservers within that workload’s allocation.

We evaluate our pro-active versus reactive provisioningschemes with the shopping and browsing mix workloadsof the TPC-W e-commerce benchmark, an industry-standardbenchmark that models an online book store. Our results areas follows:

1) The pro-active scheme avoids most SLA violations un-der a variety of load scenarios.

2) By triggering adaptations earlier and by issuing severaldatabase additions in a batch, the pro-active schemeoutperforms the reactive scheme, which adds databasesincrementally and only upon SLA violations.

3) Our system is shown to be robust. First, our instability

detection scheme based on learning avoids unnecessaryoscillations in resource allocation. Second, our systemadapts quite well to load variations when running aworkload request mix different than the mix used duringsystem training.

The rest of this paper is organized as follows. We first in-troduce the necessary background related to the KNN learningalgorithm in section II. We then introduce our system archi-tecture and give a brief overview of our dynamic replicationenvironment in Section III. Then, we discuss our reactiveand pro-active approaches in Section IV and Section V,respectively. Section VI describes our experimental testbed andbenchmark. Section VII illustrates our results for applying thetwo approaches in the adaptation process of database clustersunder different workload patterns. We compare our work torelated work in Section VIII. Finally, we conclude the paperand discuss future work in Section IX.

II. BACKGROUND

Classic analytic performance models [1], can predictwhether or not a system will violate the SLA given informationon future load. These modelling approaches are, however,not ameanable to our problem. This is due to the typicallytime consuming derivation of an analytic model for modellingcomplex concurrency control mechanisms such as the one ina replicated database system. Furthermore, in our complexsystem, the average query latency is not only related to thequery arrival rate but is also related to the semantics of thequery and the particular query workload mix.

Instead, we use thek-nearest-neighbor(KNN) classifier, amachine learning approach which considers multiple featuresin the system. KNN is an instance-based learning algorithmand has been widely applied in many areas such as textclassification [10]. In KNN, a classification decision is madeby using a majority vote ofk “nearest” neighbors based on asimilarity measure, as follows:

• For each target data set to be predicted the algorithmfinds thek nearest neighbors of the training data set. Thedistance between two data points is regarded as a measureof their similarity. The Euclidean distance is often usedfor computing the distance between numerical attributes,also called as features.

• The distance we use in this paper isweightedEuclideandistance given by the following formula:

Dist(X, Y ) =√∑N

i=1 weighti ∗ (xi − yi)2Here, X and Y represent two different data points, Ndenotes the number of features of the data,xi, yi denotethe ith feature of X and Y respectively andweightidenotes the weight of ourith feature. Each weight reflectsthe importance of the corresponding feature.

• Find the majority vote of the k nearest neighbors. Thesimilarities of testing data to the k nearest neighbors areaggregated according to the class of the neighbors, andthe testing data is assigned to the most similar class.

�

�

��

��

��

��

��

��

��

��

��

��

��

Fig. 2. Cluster Architecture

• Cross validation of the training data is an often usedcriterion that we also use to select the weights of thefeatures and the number of “nearest” neighbors -K.

One of advantages of KNN is that it is well suited forproblems with multi-modal classes (i.e., with objects whoseindependent variables have different characteristics for differ-ent subsets), and can lead to good accuracy for such problems.In KNN, trained models are implicitly defined by the storedtraining set and the observed attributes. Furthermore, KNNis robust to noisy training data and effective if the trainingdata is sufficiently large. However, KNN’s computation costgrows proportionately with the size of the training data, sincewe need to compute the distance of each target attribute toall training samples. Indexing techniques (e.g. K-D tree) canreduce this computational cost.

III. SYSTEM ARCHITECTURE

Figure 2 shows the architecture of our dynamic contentserver. In our system, a set of schedulers, one per applicationis interposed between the application and the database tiers.The scheduler tier distributes incoming requests to a clusterof database replicas. Each scheduler1 upon receiving a queryfrom the application server sends the query using a read-one,write-all replication scheme to the replica set allocated to theapplication. The replica set is chosen by a resource managerthat makes the replica allocation and mapping decisions acrossthe different applications.

The scheduler uses ourConflict-Aware replicationscheme [12] for achieving one-copy serializability [13] andscalability. With this scheme, each transaction explicitlydeclares the tables it is going to access and their accesstype. This information is used to detect conflicts betweentransactions and to assign the correct serialization order tothese conflicting transactions. The transaction serializationorder is expressed by the scheduler in terms of version

1Each scheduler may itself be replicated for availability [11], [12].

numbers. The scheduler tags queries with the versionnumbers of the tables they need to read and sends them tothe replicas. Each database replica keeps track of the localtable versions as tables are updated. A query is held at eachreplica until the table versions match the versions taggedwith the query. As an optimization, the scheduler also keepstrack of versions of tables as they become available at eachdatabase replica and sends read-only queries to a singlereplica that already has the required versions. The schedulercommunicates with a database proxy at each replica toimplement replication. As a result, our implementation doesnot require changes to the application or the database tier.

Since database allocations to workloads can vary dynami-cally, each scheduler keeps track of the currentdatabase setallocated to its workload. The scheduler is also in charge ofbringing a new replica up to date by a process we calldatamigrationduring which all missing updates are applied on thatreplica.

Our goals for resource management in our system are thatthe resource manager should be:

• Prompt. It should sense impending SLA violations accu-rately and quickly, and it should trigger resource alloca-tion requests as soon as possible in order to deal with theexpected load increase.

• Stable.It should avoid unnecessary oscillations betweenadding and removing database servers, because suchoscillations waste resources.

A. Dynamic Replication

In this section, we provide an overview of the resourcemanager that implements dynamic replication and brieflyintroduce the replica addition, removal, mapping as well asdata migration mechanisms in our system.

The resource manager makes the replica allocation and map-ping decisions for each application based on its requirementsand the current system state. The requirements are expressedin terms of a service level agreement (SLA) that consistsof a latency requirement on the application’s queries. Thecurrent system state includes the current performance of thisapplication and the system capacity. The allocation decisionsare communicated to the respective schedulers, which thenallocate or remove replicas from their replica sets.

1) Replica Addition and Removal:The resource manageradds or removes a replica to/from an application allocation ifit determines that the application is in overload or underload,respectively. Database replica removal needs to be performedconservatively because adding a database to a workload hashigh overheads. The replica addition process consists of twophases: data migration and system stabilization (see Figure 3).Data migration involves applying logs of missing updates onthe new replica to bring it up-to-date. System stabilizationinvolves load balancing and warmup of the buffer pool onthe new replica. While some of these stages may overlap,replica addition can introduce a long period over which querylatencies are high.

��

��

��

� ��

��

��

��

� ��

��

��

Fig. 3. Latency instability during replica addition.

2) Potential for Oscillations in Allocation:Oscillations indatabase allocations to workloads may occur during systeminstability induced by adaptations or rapidly fluctuating load.Assume an adaptation is necessary due to a burst in clienttraffic. Since our database scheduler cannot directly measurethe number of clients, it infers the load by monitoring varioussystem metrics instead. In the simplest case, the schedulerinfers the need to adapt due to an actual latency SLA violation.However, during the adaptation phases, i.e., data migration,buffer pool warmup and load stabilization, the latency willbe high or may even temporarily continue to increase asshown in Figure 3. Latency sampling during this potentiallylong time is thus not necessarily reflective of a continuedincrease in load, but of system instability after an adaptationis triggered. If the system takes further decisions based onsampling latency during the stabilization time, it may continueto add further replicas which are unnecessary, hence will needto be removed later. This is an oscillation in allocation whichcarries performance penalties for other applications runningon the system due to potential interference.

A similar argument may hold for other system metricsmeasured during adaptation. Their values will not be indicativeof any steady-state system configuration even if the loadpresented to the system remains unchanged. While rapid loadfluctuations may induce similar behavior, simple smoothingor filtering techniques can offer some protection to very briefload spikes. While all schemes presented in this paper usesome form of smoothing or filtering, which can dampen briefload fluctuations, our emphasys is on avoiding tuning of anysystem parameter, including smoothing coeficients. Instead wedevelop techniques for automatically avoiding all cases ofallocation oscillation caused by system metric instability.

3) Replica Mapping:Dynamic replication presents an in-herent trade-off between minimizing application interferenceby keeping replica sets disjoint versus speeding replica addi-tion by allowing overlapping replica sets. In this paper, we

use warm migration where partial overlap between replicasets of different applications is allowed. Each application isassigned a disjoint primary replica set. However, write queriesof an application are also periodically sent to a second set ofreplicas. This second set may overlap with the primary replicaset of other applications. The resource manager sends batchedupdates to the replicas in the secondary set to ensure thatthey are within a staleness bound, which is equal to the batchsize or the number of queries in the batch. The secondaryreplicas are an overflow pool that allow adding replicas rapidlyin response to temporary load spikes since data migrating ontothese replicas is expected to be a fast operation.

4) Data Migration: In this section, we describe the imple-mentation of data migration in our system. Our data migrationalgorithm is designed to bring the joining database replica upto date with minimal disruption of transaction processing onexisting replicas in the workload’s allocation.

Each scheduler maintains persistent logs for all write queriesof past transactions in its serviced workload for the purposesof enabling dynamic data replication. The scheduler logs thequeries corresponding to each update transaction and theirversion numbers at transaction commit. The write logs aremaintained per table in order of the version numbers for thecorresponding write queries.

During data migration for bringing a stale database replicaup-to-date, the scheduler replays on it all missing updates fromits on-disk update logs. The challenge for implementing aneffective data migration is that new transactions continue toupdate the databases in the workload’s allocation while datamigration is taking place. Hence, the scheduler needs to addthe new database replica to its workload’s replica mappingbefore the end of data migration. Otherwise, the new replicawould never catch up. Unfortunately, new updates cannotbe directly applied to the (stale) replica before migrationcompletes. Hence, new update queries are kept in the newreplica’s holding queues during migration. In order to controlthe size of these holding queues, the scheduler executes datamigration instages. In each stage, the scheduler reads a batchof old updates from its disk logs and transfers them to the newreplica for replay without sending any new queries. Except forpathological cases, such as sudden write-intensive load spikes,this approach reduces the number of disk log updates to be sentafter each stage, until the remaining log to be replayed fallsbelow a threshold bound. During this last stage, the schedulerstarts to send new updates to the replica being added, inparallel with the last batch of update queries from disk.

IV. REACTIVE REPLICA PROVISIONING

Figure 4, shows the replica allocation logic and the con-ditions for triggering an addition of a database replica to anapplication and for removing a replica from an application. Inthe following we describe these adaptations in detail.

A. Reactive Replica Addition

In the Steady State , the resource manager monitorsthe average latency received from each workload scheduler

Monitor &

Control

Temp Remove

Remove Add

While Migrating

Migration Finished

Done Remove

While Removing

If latency < LowSLAThreshold If latency >

HighSLAThreshold

Fig. 4. Reactive Replication Provisioning Logic

during each sampling period. The resource manager uses asmoothened latency average computed as an exponentiallyweighted mean of the formWL = α × L + (1 − α) ×WL,whereL is the current query latency. The larger the value ofthe α parameter, the more responsive the average to currentlatency.

If the average latency over the past sampling interval fora particular workload exceeds theHighSLAThreshold ,hence an SLA violation is imminent, the resource managerplaces a request to add a database to that workload’s allocation.

B. Reactive Replica Removal

If the average latency is below aLowSLAThreshold , theresource manager triggers a replica removal. The right branchof Figure 4 shows that the removal path is conservative andinvolves a tentative remove state before the replica is finallyremoved from an application’s allocation. The allocation al-gorithm enters the tentative remove state when the averagelatency is below the low threshold. In the tentative removestate, a replica continues to be updated, but is not used for loadbalancing read queries for that workload. If the application’saverage latency remains below the low threshold for a periodof time, the replica is removed from the allocation for thatworkload. This two-step process avoids system instability byensuring that the application is indeed in underload, since amistake during removal would soon require replica addition,which is expensive. For a forced remove during overload, weskip the tentative removal state and go directly to the removalstate. In either case, the database replica is removed from aapplication’s replica set only when ongoing transactions finishat that replica.

C. Enhancement of Reactive Approach with System InstabilityDetection

The resource manager makes several modifications to thisbasic allocation algorithm in order to account for replicaaddition delay and protect against oscillations. First, it stopsmaking allocation decisions based on sampling query latencyuntil the completion of the replica addition process. Com-pletion includes both data migration and system stabilization

Fig. 5. Pro-active Replication Provisioning Scheme

phases. The scheduler uses a simple heuristic for determiningwhen the system has stabilized after a replica addition. Inparticular, it waits, for aboundedperiod of time, for theaverage query latency at the new replica to become close(within a configurableImbalanceThreshold value) to theaverage query latency at the old replicas. Finally, since thiswait time can be long and can impact reaction to steep loadbursts, the resource manager uses the query latency at thenew replica in order to improve its responsiveness. Since thisreplica has little load when it is added, we use its averagelatency exceeding the high threshold as an early indication ofa need for even more replicas for that application. The resourcemanager triggers an extra replica addition in this case.

V. PRO-ACTIVE REPLICATION PROVISIONING

In our pro-active approach, the controller predicts perfor-mance metrics and takes actions in advance of need to adddatabases such that the SLA is not violated while resourcesare used close to optimally.

A. Overview

Our approach can be summarized as follows.We predictthe status of the application in the next time interval andclassify it into two categories: SLA violation or within SLA fora given number of database servers. By iterating through allpossible database set configurations, we decide the minimumsize database set that is predicted to have no SLA violations.

In more detail, Figure 5 shows the main process of ourpro-active provisioning scheme. The scheduler of each ap-plication works as the application performance monitor andis responsible for collecting various system load metrics andreporting these measured data to the global resource manager(controller). The controller consists of three main components.First, the adaptive filter predicts the future load based on thecurrent measured load information. Next, the classifier finds

out whether the SLA is broken given the predicted load and thenumber of active databases. The classifier directs the resourceallocation component to adjust the number of databases tothe proper number of databases for this application accordingto its prediction. The resource allocation component decideshow to map this request onto the real database servers byconsidering the requests of all applications and the availablesystem capacity.

B. Enhancement of Pro-active Approach with System Instabil-ity Detection

Our classifier is trained under stable states; our trainingdata is gathered for several constant loads. When the systemtriggers an adaptation to add a new database or removea database from a workload’s allocation, the system goesthrough a transitional (unstable) state. At this time, the systemmetric values are quite different from the ones measuredduring stable system states that we use as the training data. Asa result many wrong decisions could be made if the classifieruses the system metrics sampled during a period of instability.This could in its turn lead to oscillations between rapidlyadding and removing database replicas, hence unnecessaryadaptation overheads and resource waste.

To overcome this unstable phenomenon, we enhance ourpro-active provisioning algorithm to be aware of the instabilityafter adaptation. The system automatically detects unstablestates by using two indicators: the load imbalance ratio and theratio of average query throughput versus the average numberof active connection. During our training process, we recordthe normal range of variations of the load imbalance ratioamong different databases. If the measured ratio is beyondthe normal range, we decide that the system is in an unstablestate. Second, we select the two features assigned the highestweights during our training phase, which, as we will show,are the throughput and number of connections. If we observethat the ratio of the two metrics is beyond the normal range,we also decide that the system is in an unstable state.

In our approach, if the system is detected to be in anunstable state, we suppress taking any decisions until thesystem is stable.

C. Implementation Details

There are various filter and classifier algorithms that can fitinto our pro-active replication provisioning scheme. We takethe ease of the implementation and their promptness as ourselection principles.

1) Filter: We use a critically damped g-h filter [14] to trackthe variations of our load metrics. This filter minimizes thesum of the weighted errors with the decreasing weights as thedata gets older; i.e., the older the error, the less it matters. Itdoes this by weighting the most recent error by unity, the nextmost recent error by a factorΘ (whereΘ < 1), the next errorby Θ2, the next error byΘ3, and so on.

The filtering equations are as follows:

x∗n+1,n = x∗n,n−1 + hn

T (yn − x∗n,n−1),

x∗n+1,n = x∗n,n−1 + T x∗n+1,n + gn(yn − x∗n,n−1),

The equations provide an update estimate of the predictedgradient x∗n+1,n and predicted valuex∗n+1,n. T denotes thesampling interval,x is the metric we are interested in,y is itscorresponding measured value andx denotes the gradient ofx. The first subscript is used to indicate the estimated time andthe second subscript is used to indicate the last measurementtime. Thus,x∗n+1,n denotes an estimate ofx during the nexttime slot n + 1 based on the measurements made at currenttime n and before.

The parametersg and h are related toΘ by the formulasg = 1−Θ2, h = (1−Θ)2, whereΘ is decided by the standarddeviation of the measurement error and the desired predictionerror.

This filter uses a simple recursive equation calculation forthe constantsg and h. Thus it is fast in tracking speedcompared to more advanced adaptive filters, such as extendedKalman Filters [14].

2) KNN Classifier: We select several application metricsand system metrics as the features used by our KNN classifierin order to enhance our confidence about the automaticallyinferred load information. These system metric are readilyavailable from our environment. They are as follows:

1) Average query throughput - denotes the number ofqueries completed during a measurement interval.

2) Average number of active connections - counts theaverage number of active connections as detected by ourevent-driven scheduler within its select loop. An activeconnection is a connection used to deliver one or morequeries from application servers to the scheduler duringa measurement interval.

3) Read write ratio - shows the ratio of read queriesversus write queries during a measurement interval. Thisfeature reflects the workload mix.

4) Lock ratio - shows the ratio of locks held versus totalqueries during a measurement interval.

5) CPU, Memory and I/O usage reported byvmstat .Metrics 1 to 4 are gathered by the scheduler of each

application. The traditional system metrics in 5 are measuredon each database server.

Although a single metric may reflect the load informationto some degree, basing decisions on a single metric couldbe seriously skewed e.g, if the composition of queries in themix changes or if the system states are not fully reproducible.We use cross validation techniques in KNN to identify theimportance (i.e., weight) of each metric. The weights areautomatically determined and they reflect the usefulness offeatures. Not all features are always useful during on-lineload situations. Their usefulness depends on the characteristicsof the current workload mix. For example, for a CPU-boundworkload mix, I/O metrics will be irrelevant for the purposesof load estimation.

During our training phase, we run 10-fold cross validationfor weight combinations varying within a finite range, andpick the weight setting whose accuracy is higher than our

target accuracy (95%) or achieves the highest accuracy inthe given range. We can continue to expand the search spaceby gradient methods until we achieve a good accuracy. Thistraining process assigns weights for all features off-line. Lessimportant features automatically get lower weights.

VI. EXPERIMENTAL SETUP

To evaluate our system, we use the same hardware for allmachines running the client emulator, the web servers, theschedulers and the database engines. Each is a dual AMDAthlon MP 2600+ computer with 512MB of RAM and 2.1GHzCPU. All the machines use the RedHat Fedora Linux operatingsystem. All nodes are connected through 100Mbps EthernetLAN.

We run the TPC-W benchmark that is described in moredetail below. It is implemented using three popular open sourcesoftware packages: the Apache web server [15], the PHPweb-scripting/application development language [16] and theMySQL database server [17]. We use Apache 1.3.31 web-servers that run the PHP implementation of the business logicof the TPC-W benchmark. We use MySQL 4.0 with InnoDBtables as the database backend.

All experimental numbers are obtained running an imple-mentation of our dynamic content server on a cluster of 8database server machines. We use a number of web servermachines sufficient for the web server stage not to be the bot-tleneck. The largest number of web server machines used forany experiment is 3. We use one scheduler and one resourcemanager. The thresholds we use in the reactive experimentsare a HighSLAThreshold of 600ms and a LowSLAThresholdof 200ms. The SLA threshold used in our pro-active approachis 600ms. The SLA threshold was chosen conservatively toguarantee an end-to-end latency at the client of at most 1second for the TPC-W workload. We use a latency samplinginterval of 10 seconds for the scheduler.

A. TPC-W E-Commerce Benchmark

The TPC-W benchmark from the Transaction ProcessingCouncil [18] is a transactional web benchmark designed forevaluating e-commerce systems. Several interactions are usedto simulate the activity of a retail store such as Amazon.The database size is determined by the number of items inthe inventory and the size of the customer population. Weuse 100K items and 2.8 million customers which results in adatabase of about 4 GB.

The inventory images, totaling 1.8 GB, are resident onthe web server. We implemented the 14 different interactionsspecified in the TPC-W benchmark specification. Of the 14scripts, 6 are read-only, while 8 cause the database to beupdated. Read-write interactions include user registration, up-dates of the shopping cart, two order-placement interactions,two involving order inquiry and display, and two involvingadministrative tasks. We use the same distribution of scriptexecution as specified in TPC-W. In particular, we use theTPC-W shopping mix workload with 20% writes which isconsidered the most representative e-commerce workload by

the Transactional Processing Council. The complexity of theinteractions varies widely, with interactions taking between 20ms and 1 second on an unloaded machine. Read-only interac-tions consist mostly of complex read queries in auto-commitmode. These queries are up to 30 times more heavyweightthan read-write transactions.

B. Client Emulator

To induce load on the system, we have implemented asession emulator for the TPC-W benchmark. A session isa sequence of interactions for the same customer. For eachcustomer session, the client emulator opens a persistent HTTPconnection to the web server and closes it at the end ofthe session. Each emulated client waits for a certain thinktime before initiating the next interaction. The next interactionis determined by a state transition matrix that specifies theprobability of going from one interaction to another. Thesession time and think time are generated from a randomdistribution with a given mean. For each experiment, we use aload function according to which we vary the number of clientsover time. However, the number of active clients at any givenpoint in time may be different from the actual load functionvalue at that time, due to the random distribution of per-clientthink time and session length. For ease of representing loadfunctions, in our experiments, we plot the input load functionnormalized to a baseline load.

VII. E XPERIMENTAL RESULTS

A. System Training

In this section, we describe our training phase and its effecton the assigned weights for our pre-selected system features.

We train our system on the TPC-W shopping mix withdatabase configurations of 1 through 8 replicas and clientloads from 30 to 220 clients under stable states. The weightsof features in the TPC-W shopping mix obtained from thetraining phase on this mix are listed here in the order ofimportance: the average number of active connections, theaverage query throughput, the read/write ratio, the CPU usage,the Lock ratio, the memory usage and the I/O usage. The TPC-W shopping mix has significant locality of access. Hence, it isnot an I/O intense workload. This explains the low relevanceof I/O usage. Furthermore, the MySQL database managementsystem does not free the memory pages for TPC-W even ifit is in under-load, so memory usage also has low relevancefor inferring the load level. On the other hand, contrary to ourintuition, the lock ratio does not show a high association withthe load level. The lock ratio could, however, show higherrelevance for larger cluster configurations.

B. Pro-active Approach without Stability Awareness

In this section, we show the influence of system metricinstability during adaptation. Figure 7(b) shows an exampleof such oscillations that happen under the continuous loadfunction shown in Figure 7(a). The oscillations happen due to(incorrect) adaptation decisions taken during system instability

0 100 200 300 400 500 6000

1

2

3

4

Time (seconds)

Nor

mal

ized

load

(a) Sine load function

0 100 200 300 400 500 600 7001

2

3

4

5Proactive Provisioning without Stability Awareness

Time (seconds)

Num

ber

of D

atab

ases

(b) Average query latency for Pro-active without Stability

0 100 200 300 400 500 600 7001

2

3

4

Comparison of Proactive v.s. Reactive

Time (seconds)

Num

ber

of D

atab

ases

ProactiveReactive

(c) Machine allocation

0 100 200 300 400 500 600 700 8000

200

400

600

800

1000

1200

1400

Time (seconds)

Ave

rage

Que

ry L

aten

cy (

ms)

ProactiveReactiveSLA

(d) Average query latency for Pro-active versus Reactive

Fig. 7. Scenario with sine load

0

200

400

600

800

1000

1200

1400

220 225 230 235 240 245 250 255 260

Lat

ency

(m

illi-

seco

nds)

Time (seconds)

Start Add

End Migration

LatencySLA

Fig. 6. Latency variation while the system adapts from 2 to 3 databases.

when system metrics are varying wildly immediately after anadaptation.

In order to explain the oscillations, Figure 6 zooms into asmall time period of the previous adaptation graph (between

TABLE I

AN ILLUSTRATION OF INSTABILITY

No. of db (time) Throughput Avg. Active Conn.

Just before addition2db (t1) 80.80 118

After addition3db (t2) 142.20 1053db (t3) 167.10 893db (t4) 211.40 70

In stable state3db 170 70

times 220 and 260 seconds of the experiment) to illustratethe variation of latency during an adaptation. It shows thelatency pattern when the system adapts from a configuration of2 databases to a configuration of 3 databases i.e., one databaseaddition. We can see that the latency initially increases duringthe data migration phase, until End Migration, then stays flatwhile the buffer pool warms up on the new replica and finallydecreases and stabilizes thereafter.

In addition, table I shows the variation of our highestweighted system metrics, the number of active incomingconnections and the throughput just before and just afteradding the new database (during 4 time steps). The tablealso shows the corresponding stable state values in the targetconfiguration. During stable states, the average number ofactive connections at the scheduler is closely correlated withthe total load on the system induced by active clients. Incontrast, as we can see from the table, the number of activeconnections might register a sudden decrease immediately af-ter adding a new database even while the throughput increases.These effects are due to the various factors at play duringsystem stabilization. For example, as the new replica gets morerequests, the overload on existing replicas starts to normalizeand many client requests that were delayed due to overloadfinally complete. As a result, a larger than usual number ofclients may get their responses at this time. These clients willbe in the thinking state during the next interval explaining thelower number of active incoming conections after adaptation.

These abnormal system metric variations after an adaptationmay induce wrong load and latency estimates or predic-tions. For example, our KNN predictor might interpret thelow number of active incoming connections as underload.Wrong decisions, such as removing a database immediatelyafter adding it or vice versa may result. In the rest of theexperiments, our KNN-based prediction algorithm is enhancedto suppress taking decisions during intervals when systeminstability is detected.

C. Performance Comparison of the Pro-active and ReactiveApproaches

In the following, we evaluate the two autonomic provi-sioning approaches: reactive and pro-active. We first considera scenario with continuously changing load, where the loadvariation follows a sinusoid (sine) function. Then we considera sudden change scenario with a large load spike.

1) Load with Continuous Change Scenario:We use ourclient emulator to emulate a sinusoid load function, shownin Figure 7(a). As we can see from Figure 7(c), the pro-active approach triggers replica adding actions earlier than thereactive approach, because it performs future load prediction.In contrast, the pro-active removal is slightly slower thanthe reactive database removal because we use a conservativedecision regarding when the system is sufficiently stable toaccurately decide on removal. Figure 7(d) shows the compar-ison of average query latency for these two approaches. Asa result of the earlier resource adaptations of the pro-activeprovisioning, this approach registers fewer SLA violationsthan the reactive approach. Furthermore, the degree of SLAviolations, reflected in the average latency peaks, is alsoreduced compared to the reactive approach.

2) Sudden Load Spike Scenario:We use our client emulatorto emulate a load function with a sudden spike, shown inFigure 8(a).

From Figure 8(c), we see that neither scheme can avoid SLAviolations when the load jump happens, since the change is too

abrupt to predict. However, the pro-active provisioning has alower SLA violation peak and duration than the reactive provi-sioning approach. Specifically, the pro-active approach reachesa query latency peak of 2 seconds, and the SLA violationslast around 50 seconds while the reactive approach reaches aquery latency peak of 5 seconds, and its corresponding SLAviolations last more than 2 minutes.

The reactive provisioning is much slower in its adaptationbecause it needs to obtain feedback from the system after eachdatabase addition. It does not know how many databases itshould add, so it has to add the databases one at a time.In contrast, the pro-active approach can predict how manydatabases the system needs for the current and predictedload and it is able to trigger several simultaneous additionsin advance of need. Figure 8(b) shows that the pro-activescheme adds 3 databases in a batch by requesting 3 databasessimultaneously, while the reactive approach needs to add the3 databases sequentially.

D. Robustness of the Pro-active Approach

In this section, we show that our pro-active scheme is rela-tively robust to some degree of on-line variation in workloadmix and different load patterns given a fixed training data set.

Figures 9(a) and 9(b) show how our pro-active approachadapts on-line under a workload request mix different thanthe one it has been trained with. In particular, we train thesystem on a data set corresponding to stable load scenariosfor the TPC-W shopping mix as before. We then show on-line adaptations for running TPC-W with the browsing mix.The browsing mix contains a different query mix compositionthan the shopping mix (with 5% versus 20% write queries).We use the same sine load function as before. We can seethat our pro-active scheme adapts quite well to load increases,minimizing the number of SLA violations. It adapts less wellto load decreases, however, by retaining more databases thanstrictly necessary. This effect is most obvious towards the endof the run.

Figure 10 shows the robustness of our learning-based ap-proach under a step load function. Figure 10(a) shows thestep function and the evolution of the instantaneous numberof active client connections (as opposed to thinking clients) asmeasured at the emulator induced by this load function. Fig-ure 10(b) shows that the allocation of the pro-active scheme isstable while the reactive scheme may register some oscillationsin allocation if the thresholds it uses are not tuned. We run thereactive scheme in two configurations, pre-tuned and untunedby using to different values for the ImbalanceThreshold, whichgoverns the scheme’s heuristic instability detection. We usea threshold of 10% load imbalance with a time-out of 2minutes for a pre-tuned reactive approach and a random valueof these parameters for the other reactive graph shown in theFigure. We can see that the reactive scheme registers allocationoscillations which also incur latency SLA violations for bothparameter settings with more oscillations for the untunedconfiguration. More sensitivity analysis results and oscillationsinduced by different parameters for the reactive scheme, e.g.,

0 100 200 300 400 500 6000

1

2

3

4

Time (seconds)

Nor

mal

ized

Loa

d

(a) Load function

0 100 200 300 400 500 6001

2

3

4

Time (seconds)

Num

ber

of D

atab

ases

Comparison of Proactive v.s. Reactive

ProactiveReactive

(b) Machine allocation

0 100 200 300 400 500 6000

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Time (seconds)

Ave

rage

Que

ry L

aten

cy (

ms)

ProactiveReactiveSLA

(c) Average query latency

Fig. 8. A scenario with sudden load change

LowSLAThreshold values are shown elsewhere [9]. Since ourpro-active scheme learns the normal imbalance range and useshighly relevant system metrics according to the learning phaseto determine instability, it is inherently more robust and hasno allocation oscillations.

Finally, although the step function makes it slightly harderto predict the load trend compared to the sine load function,for the pro-active approach latency SLA violations are mostlyavoided in this load scenario as well.

VIII. R ELATED WORK

This paper addresses the hard problem of autonomic re-source provisioning within the database tier, advancing theresearch area of autonomic computing [2]. Autonomic com-puting is the application of technology to manage technology,materialized into the development of automated self-regulatingsystem mechanisms. This is a very promising approach todealing with the management of large scale systems, hencereducing the need for costly human intervention.

Resource prediction is important to adjust parameters ofutility functions or decide the mapping between SLA andthe amount of resources in autonomic resource provision-ing. A related paper [7] uses a table-driven approach thatstores response time values from offline experiments withdifferent workload density and different numbers of servers.Interpolation is used to obtain values not in the table. Thismethod is simple, but it may become infeasible for highlyvariable workload where it is hard to collect sufficient data ifthe number of available resources is large. Different queuingmodels [1] [19] are presented as analytic performance modelsof web servers and demonstrate good accuracy in simulations.

To the best of our knowledge, current performance predic-tion techniques [7] [1] [19] for large data centers assume asingle database server as back-end, hence their performanceis unknown for database clusters. The applicability of genericqueuing models to database applications needs further inves-tigation to understand modelling accuracy for the complexconcurrency control situations in database clusters.

Various scheduling policies for proportional share resourceallocation can be found in the literature, such as STFQ [20].Steere et al. [21] describe a feedback-based real-time schedulerthat provides reservations to applications based on dynamicfeedback, eliminating the need to reserve resources a priori.In other related papers discussing controllers [22], [23] thealgorithms use models by selecting various parameters to fit atheoretical curve to experimental data. These approaches arenot generic and need cumbersome profiling in systems runningmany workloads. An example is tuning various parameters in aPI controller [22]. The parameters are only valid for the tunedworkload and not applicable for controlling other workloads.In addition, none of these controllers incorporate the fact thatthe effects of control actions may not be seen immediatelyand the fact that the system may be instable immediately afteradaptation.

Cohen et al. [24] propose using a tree-augmented Bayesiannetwork (TAN) to discover correlations between system met-rics and service level objectives (SLO). Through training,the TAN discovers the subset of system metrics that leadto SLO violations. While this approach predicts violationsand compliances with good accuracy, it does not provide anyinformation on how to adapt to avoid SLO violations. Incontrast, our prediction scheme determines how many replicasmust be added to maintain SLO compliance.

Our study builds on recently proposed transparent scalingthrough content-aware scheduling in replicated database clus-ters [25], [26], [27]. On the other hand, these systems donot investigate database replication in the context of dynamicprovisioning. While Kemme et al. [25] proposes algorithmsfor database cluster reconfiguration, the algorithms are notevaluated in practice. Our paper studies efficient methodsfor dynamically integrating a new database replica into arunning system and provides a thorough system evaluationusing realistic benchmarks.

Finally, our work is related but orthogonal to ongoingprojects in the areas of self-managing databases that can self-

1

1.5

2

2.5

3

3.5

4

4.5

5

0 100 200 300 400 500 600

# of

Rep

licas

Allo

cate

d

Time (seconds)

(a) Machine allocation

0

200

400

600

800

1000

1200

1400

1600

0 100 200 300 400 500 600

Lat

ency

(m

illi-

seco

nds)

Time (seconds)

LatencySLA

(b) Average query latency

Fig. 9. A browsing workload scenario with sine load

0

0.5

1

1.5

2

2.5

3

3.5

4

0 100 200 300 400 500 600

Nor

mal

ized

load

Time (milliseconds)

Load Function # of active connections at the client emulator

(a) Load function

1

1.5

2

2.5

3

3.5

4

4.5

5

0 100 200 300 400 500 600

# of

Rep

licas

Allo

cate

d

Time (seconds)

ProactiveReactive with tuned setting

Reactive with random setting

(b) Machine allocation

0

200

400

600

800

1000

1200

1400

1600

0 100 200 300 400 500 600

Lat

ency

(m

illi-

seco

nds)

Time (seconds)

ProactiveReactive with random seting

SLA

(c) Average query latency

0

200

400

600

800

1000

1200

1400

1600

0 100 200 300 400 500 600

Lat

ency

(m

illi-

seco

nds)

Time (seconds)

ProactiveReactive with tuned setting

SLA

(d) Average query latency

Fig. 10. A scenario with a step load function

optimize based on query statistics [28], and to recent work inautomatically reconfigurable static content web servers [29]and application servers [30].

IX. CONCLUSIONS ANDFUTURE WORK

In this paper, we address autonomic provisioning in thecontext of dynamic content database clusters. We introducea novel pro-active scheme based on the classic K-nearest-neighbors (KNN) machine learning approach for adding

database replicas to application allocations based on: i) loadpredictions, ii) extensive off-line measurements of system andapplication metrics for stable system states and iii) lightweighton-line monitoring that does not interfere with system scaling.

We use a full prototype implementation of both our pro-active approach and a previous reactive approach to dynamicprovisioning of database replicas. Overall, our experimentalresults show that our KNN-based pro-active approach is a

promising approach for autonomic resource provisioning indynamic content servers. Compared to the previous reac-tive approach, the pro-active approach has the advantage ofpromptness in sensing the load trend and its ability to triggerseveral database additions in advance of SLA violations. Byand large our pro-active approach avoids SLA violations underload variations. For unpredictable situations of very suddenload spikes, the pro-active approach can alleviate the SLAviolations faster than the reactive approach even if SLAviolations do occur in this case. Finally, off-line training onsystem-level information is also useful for recognizing periodsof instability after triggering adaptations. Detecting unstablesystem states reduces the prediction errors of the pro-activeapproach in such cases.

The assumption of our prediction algorithm (KNN) is thatthe distribution of real data is similar to our training data.Hence, if the incoming traffic and our training set differgreatly, our scheme is unable to make informed decisions. Inour future work, we will explore advanced machine learningalgorithms which can effectively do outlier detection, hencecan identify whether the current workload is similar with ourtraining set. If no match is detected, our resource managerwill disable the pro-active scheme, and revert to a moreconservative reactive scheme. On the other hand, we wouldlike to explore online training algorithms that automaticallyuse the most recent workload features as training data. Inaddition, we will explore advanced adaptive filters to smoothmeasurement errors and transient load variations.

ACKNOWLEDGEMENTS

We thank the anonymous reviewers for their detailed com-ments and suggestions for improvement on the earlier ver-sion of this paper. We further acknowledge the generoussupport of IBM Toronto Lab through a student and facultyfellowship with the IBM Centre for Advanced Studies (CAS),IBM Research for a faculty award, the Natural Sciences andEngineering Research Council of Canada (NSERC), Commu-nications and Information Technology Ontario (CITO) andCanadian Foundation for Innovation (CFI) through discoveryand infrastructure grants.

REFERENCES

[1] M. N. Bennani and D. A. Menasce, “Resource allocation for autonomicdata centers using analytic performance models,” inIn Proceedings ofthe 2nd International Conference on Autonomic Computing (ICAC),2005.

[2] IBM, “Autonomic Computing Manifesto,”http://www.research.ibm.com/ autonomic/manifesto, 2003.

[3] IBM Corporation, “Automated provisioning of re-sources for data center environments,” http://www-306.ibm.com/software/tivoli/solutions/provisioning/, 2003.

[4] G. Tesauro, R. Das, W. E. Walsh, and J. O. Kephart, “Utility-function-driven resource allocation in autonomic systems .” inICAC, 2005, pp.70–77.

[5] C. Amza, E. Cecchet, A. Chanda, A. Cox, S. Elnikety, R. Gil, J. Mar-guerite, K. Rajamani, and W. Zwaenepoel, “Specification and imple-mentation of dynamic web site benchmarks,” in5th IEEE Workshop onWorkload Characterization, Nov. 2002.

[6] “The ”Slashdot effect”: Handling the Loads on 9/11,”http://slashdot.org/articles/01/09/13/154222.shtml.

[7] W. E. Walsh, G. Tesauro, J. O. Kephart, and R. Das, “Utility functionsin autonomic systems,” inIn Proceedings of the 1st InternationalConference on Autonomic Computing (ICAC), 2004.

[8] K. Coleman, J. Norris, G. Candea, and A. Fox, “Oncall: Defeatingspikes with a free-market server cluster,” inIn Proceedings of the 1stInternational Conference on Autonomic Computing (ICAC), 2004.

[9] G. Soundararajan, C. Amza, and A. Goel., “Database replication policiesfor dynamic content applications,” inIn Proceedings of the First ACMSIGOPS EuroSys, 2006.

[10] E.-H. S. Han, G. Karypis, and V. Kumar, “Text categorization usingweight adjusted k -nearest neighbor classification,” inLecture Notes inComputer Science, vol. 2035, 2001.

[11] C. Amza, A. Cox, and W. Zwaenepoel, “Conflict-aware scheduling fordynamic content applications,” inProceedings of the Fifth USENIXSymposium on Internet Technologies and Systems, Mar. 2003, pp. 71–84.

[12] C. Amza, A. L. Cox, and W. Zwaenepoel, “Distributed versioning:Consistent replication for scaling back-end databases of dynamic contentweb sites.” inMiddleware, ser. Lecture Notes in Computer Science,M. Endler and D. C. Schmidt, Eds., vol. 2672. Springer, 2003, pp.282–304.

[13] P. Bernstein, V. Hadzilacos, and N. Goodman,Concurrency Control andRecovery in Database Systems. Addison-Wesley, 1987.

[14] E. Brookner, Tracking and Kalman Filtering Made Easy. WileyInterscience.

[15] “The Apache Software Foundation,” http://www.apache.org/.[16] “PHP Hypertext Preprocessor,” http://www.php.net.[17] “MySQL,” http://www.mysql.com.[18] “Transaction Processing Council,” http://www.tpc.org/.[19] T. Zheng, J. Yang, M. Woodside, M. Litoiu, and G. Iszlai, “Tracking

time-varying parameters in software systems with ex-tended Kalmanfilters,” in Proceedings of CASCON, 2005.

[20] P. Goyal, X. Guo, and H. M. Vin, “A Hierarchical CPU Schedulerfor Multimedia Operating System,” inProceedings of the 2nd USENIXSymposium on Operating Systems Design and Implementation, Seattle,WA, Oct. 1996.

[21] D. C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, and J. Walpole,“A Feedback-driven Proportion Allocator for Real-Rate Scheduling,”in Proceedings of the 3rd USENIX Symposium on Operating SystemsDesign and Implementation, Feb. 1999.

[22] Y. Diao, J. L. Hellerstein, and S. Parekh, “Optimizing quality ofservice using fuzzy control,” inDSOM ’02: Proceedings of the 13thIFIP/IEEE International Workshop on Distributed Systems: Operationsand Management. Springer-Verlag, 2002, pp. 42–53.

[23] B. Li and K. Nahrstedt, “A control-based middleware framework forquality of service adaptations,”IEEE JSAC, 1999.

[24] I. Cohen, J. S. Chase, M. Goldszmidt, T. Kelly, and J. Symons,“Correlating instrumentation data to system states: A building blockfor automated diagnosis and control.” inOSDI, 2004, pp. 231–244.

[25] B. Kemme, A. Bartoli, andO. Babaoglu, “Online reconfiguration inreplicated databases based on group communication.” inDSN. IEEEComputer Society, 2001, pp. 117–130.

[26] E. Cecchet, J. Marguerite, and W. Zwaenepoel, “C-JDBC: Flexibledatabase clustering middleware,” inProceedings of the USENIX 2004Annual Technical Conference, Jun 2004.

[27] J. M. Milan-Franco, R. Jimenez-Peris, M. Patio-Martnez, andB. Kemme, “Adaptive middleware for data replication,” inProceedingsof the 5th ACM/IFIP/USENIX International Middleware Conference,Oct. 2004.

[28] P. Martin, W. Powley, H. Li, and K. Romanufa, “Managing databaseserver performance to meet Qos requirements in electronic commercesystems,”International Journal on Digital Libraries, vol. 3, pp. 316–324, 2002.

[29] S. Ranjan, J. Rolia, H. Fu, and E. Knightly, “QoS-Driven ServerMigration for Internet Data Centers,” in10th International Workshopon Quality of Service, May 2002.

[30] E. Lassettre, D. W. Coleman, Y. Diao, S. Froehlich, J. L. Hellerstein,L. Hsiung, T. Mummert, M. Raghavachari, G. Parker, L. Russell,M. Surendra, V. Tseng, N. Wadia, and P. Ye, “Dynamic surge protection:An approach to handling unexpected workload surges with resourceactions that have lead times.” inDSOM, ser. Lecture Notes in ComputerScience, M. Brunner and A. Keller, Eds., vol. 2867. Springer, 2003,pp. 82–92.

Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers

Documents

Transcript of Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers