Response Time Service Level Agreements for Cloud-hosted … · Response Time Service Level...

Response Time Service Level Agreementsfor Cloud-hosted Web Applications

Hiranya Jayathilaka Chandra Krintz Rich WolskiComputer Science Dept Univ of California Santa Barbara

hiranyackrintzrichcsucsbedu

AbstractCloud computing is a successful model for hosting web-facing applications that are accessed by their users as ser-vices While clouds currently offer Service Level Agree-ments (SLAs) containing guarantees of availability they donot make performance guarantees for deployed applications

In this work we present Cerebro ndash a system for estab-lishing statistical guarantees of application response time incloud settings Cerebro combines off-line static analysis ofapplication control structure with on-line cloud performancemonitoring and statistical forecasting to predict bounds onthe response time of web-facing application programminginterfaces (APIs) Because Cerebro does not require applica-tion instrumentation or per-application cloud benchmarkingit does not impose any runtime overhead and is suitable foruse at cloud scales Also because the bounds are statisticalthey are appropriate for use as the basis for SLAs betweencloud-hosted applications and their users

We investigate the correctness of Cerebro predictions thetightness of their bounds and the duration over which thebounds persist in both Google App Engine and AppScale(public and private cloud platforms respectively) We alsodetail the effectiveness of our SLA prediction methodologycompared to other performance bound estimation methodsbased on simple statistical analysis

Keywords Cloud computing Web APIs SLA

1 IntroductionWeb services service oriented architectures and cloud plat-forms have revolutionized the way developers engineer anddeploy software Using the web service model developerscreate new applications by ldquomashing uprdquo content and func-

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for components of this work owned by others than ACMmust be honored Abstracting with credit is permitted To copy otherwise or republishto post on servers or to redistribute to lists requires prior specific permission andor afee Request permissions from permissionsacmorgUCSB Technical Report 2015-004 April 21 2014

tionality from existing services exposed via web-accessibleapplication programming interfaces (web APIs) This ap-proach both expedites and simplifies implementation sincedevelopers can leverage the software engineering and main-tenance efforts of others Moreover platform-as-a-service(PaaS) clouds for hosting web applications have emerged asa key technology for managing applications at scale in bothpublic (managed) and private settings

Consequently web APIs are rapidly proliferating At thetime of this writing ProgrammableWeb [35] indexes over13000 publicly available web APIs These APIs increas-ingly employ the REST (Representational State Transfer)architectural style [13] and target both commercial (eg ad-vertising shopping travel etc) and non-commercial (egIEEE [22] UC Berkeley [5] US White House [44]) applica-tion domains

Despite the many benefits reusing existing services alsohas its costs In particular new applications become depen-dent on the services they compose These dependencies im-pact correctness performance and availability of the com-posite applications ndash for which the ldquotop levelrdquo developer isoften held accountable Compounding the situation the un-derlying services can and do change over time while theirAPIs remain stable unbeknownst to the developers that pro-grammatically access them Unfortunately there is a dearthof tools that help developers reason about these dependen-cies throughout an applicationrsquos lifecycle (ie developmentdeployment and runtime) Without such tools programmersmust resort to extensive continuous and costly testing andprofiling to understand the performance impact on their ap-plications that results from the increasingly complex collec-tion of services that they depend on

To address this issue we present Cerebro a new approachthat predicts bounds on the response time performance ofweb APIs exported by applications that are hosted in a PaaScloud The goal of Cerebro is to allow a PaaS administratorto determine what response time service level agreement(SLA) can be fulfilled by each web API operation exportedby the applications hosted in the PaaS

SLAs typically specify a minimum service level and aprobability (usually large) that the minimum service level

1

will be maintained Cerebro uses a combination of staticanalysis of the hosted web APIs and runtime monitoring ofthe PaaS cloud (not the web APIs themselves) to determinewhat minimum response time guarantee can be made witha target probability specified by a PaaS administrator Thesecalculated SLAs enable developers to reason about the per-formance of the applications that consume the cloud-hostedweb APIs

Currently cloud computing systems such as AmazonWeb Services (AWS) [3] and Google App Engine (GAE) [15]offer reliability SLAs specifying the fraction of availabilityover a fixed time period for their services that users can ex-pect when they contract to use a service However they donot provide SLAs guaranteeing minimum levels of perfor-mance In contrast Cerebro predictions make it possible todetermine response time SLAs with probabilities specifiedby the cloud provider in a way that is scalable

Cerebro is designed to improve the use of both publicand private PaaS clouds which have emerged as popular ap-plication hosting venues [38] For example there are overfour million active GAE applications that can execute onGooglersquos public cloud or over AppScale an open sourceprivate cloud version of App Engine A PaaS cloud providesdevelopers with a collection of commonly used scalable ser-vices that the platform exports via APIs defined within asoftware development kit (cloud SDK) These services arefully managed and covered under the availability SLAs ofthe cloud platform For example services in GAE and App-Scale cloud SDK include a distributed NoSQL datastoretask management and data caching among others

Cerebro generates response time SLAs for API calls ex-ported by a web application developed using the servicesavailable within the PaaS For brevity in this work we willuse the term web API to refer to a web-accessible API ex-ported by an application hosted on a PaaS platform Furtherwe will use the term cloud SDK to refer to the APIs that aremaintained as part of the PaaS and available to all hosted ap-plications This enables us to differentiate the internal APIsof the PaaS from the APIs exported by the deployed appli-cations For example an application hosted in Google AppEngine might export one or more web APIs to its users whileleveraging the internal cloud SDK for the Google datastorethat is available as part of the Google App Engine PaaS

Cerebro uses static analysis to identify the cloud SDKinvocations that dominate the response time of web APIsIndependently Cerebro also maintains a running history ofcloud SDK response time performance It uses QBETS [33]ndash a forecasting methodology we have developed in priorwork for predicting bounds on ldquoill behavedrdquo univariate timeseries ndash to predict response time bounds on each cloud SDKinvocation made by the application It combines these pre-dictions dynamically for each static program path througha web API operation and returns the ldquoworst-caserdquo upperbound on the time necessary to complete the operation

Because service implementations and platform behaviorunder load change over time Cerebrorsquos predictions necessar-ily have a lifetime That is the predicted SLAs may becomeinvalid after some time As part of this paper we developa model for detecting such SLA invalidations We use thismodel to investigate the effective lifetime of Cerebro predic-tions When such changes occur Cerebro can be reinvokedto establish new SLAs for any deployed web API

We have implemented Cerebro for both the Google AppEngine public PaaS and the AppScale private PaaS Givenits modular design and this experience we believe that Cere-bro can be easily integrated into any PaaS system We useour prototype implementation to evaluate the accuracy ofCerebro as well as the tightness of the bounds it predicts (iethe difference between the predictions and the actual APIexecution times) To this end we carry out a range of exper-iments using App Engine applications that are available asopen source

We also detail the duration over which these predictionshold in both GAE and AppScale We find that Cerebro gen-erates correct SLAs (predictions that meet or exceed theirprobabilistic guarantees) and that these SLAs are valid overtime periods ranging from 14 hours to more than 24 hoursWe also find that the high variability of performance in pub-lic PaaS clouds due to their multi-tenancy and massive scalerequires that Cerebro be more conservative in its predictionsto achieve the desired level of correctness In comparisonCerebro is able to make much tighter SLA predictions forweb APIs hosted in private single tenant clouds

Because Cerebro provides this analysis statically it im-poses no run-time overhead on the applications themselvesIt requires no run-time instrumentation of application codeand it does not require any performance testing of the webAPIs Furthermore because the PaaS is scalable and SDKmonitoring data is shared across all Cerebro executions thecontinuous monitoring of the cloud SDK generates no dis-cernible load on the cloud platform Thus we believe Cere-bro is suitable for highly scalable cloud settings

Finally we have developed Cerebro for use with EA-GER (Enforced API Governance Engine for REST) [23]ndash an API governance system for PaaS clouds EAGER at-tempts to enforce governance policies at the deployment-time of cloud applications These governance policies arespecified by cloud administrators to ensure the reliable op-eration of the cloud and the deployed applications PaaSplatforms include an application deployment phase duringwhich the platform provisions resources for the applicationinstalls the application components and configures them touse the cloud SDKs EAGER injects a policy checking andenforcement step into this deployment workflow so that onlyapplications that are compliant with respect to site-specificpolicies are successfully deployed Cerebro allows PaaS ad-ministrators to define EAGER policies that allow an appli-cation to be deployed only when its web APIs meet a pre-

2

determined (or pre-negotiated) SLA target and to be notifiedby the platform when such SLAs require renegotiation

We structure the rest of this paper as follows We firstcharacterize the domain of PaaS-hosted web APIs for GAEand AppScale in Section 2 We then present the design ofCerebro in Section 3 and overview our software architectureand prototype implementation Next we present our empir-ical evaluation of Cerebro in Section 4 Finally we discussrelated work (Section 5) and conclude (Section 6)

2 Domain Characteristics and AssumptionsThe goal of our work is to analyze a web API staticallyand from this analysis without deploying or running the webAPI accurately predict an upper bound on its response timeWith such a prediction developers and cloud administratorscan provide performance SLAs to the API consumers (hu-man or programmatic) to help them reason about the per-formance implications of using APIs ndash something that is notpossible today For general purpose applications such worst-case execution time analysis has been shown by numerousresearchers to be challenging to achieve for all but simpleprograms or specific application domains

To overcome these challenges we take inspiration fromthe latter and exploit the application domain of PaaS-hostedweb APIs to achieve our goal In this paper we focus onthe popular Google App Engine (GAE) public PaaS andAppScale private PaaS which support the same applicationsdevelopment and deployment model and platform services

The first characteristic of PaaS systems that we exploit tofacilitate our analysis is their predefined programming inter-faces through which they export various platform servicesHerein we refer to these programming interfaces as the cloudsoftware development kit or the cloud SDK We refer to theindividual member interfaces of the cloud SDK as cloudSDK interfaces and to their constituent operations as cloudSDK operations These interfaces export scalable function-ality that is commonly used to implement web APIs key-value datastores databases data caching task and user man-agement security and authentication etc The App Engineand AppScale cloud SDK is detailed in httpscloud

googlecomappenginedocsjavajavadocFigure 1 illustrates the PaaS development and deploy-

ment model Developers implement their application code asa combination of calls to the cloud SDK and their own codeThe service implementations for the cloud SDK are highlyscalable highly available (have SLAs associated with them)and automatically managed by the platform Developers thenupload their applications to the cloud for deployment Oncedeployed the applications and any web APIs exported bythem can be accessed via HTTPS requests by external orco-located clients

Typically PaaS-hosted web APIs perform one or morecloud SDK calls The reason for this is two-fold First cloudSDKs provide web APIs with much of the functionality thatthey require Second PaaS clouds ldquosandboxrdquo web APIs to

Figure 1 PaaS-hosted Web APIs (a) An external clientmaking requests to the API (b) A PaaS-hosted web APIinvoking another in the same cloud

enforce quotas to enable billing and to restrict certain func-tionality that can lead to security holes platform instabil-ity or scaling issues [16] For example GAE and AppScalecloud platforms restrict the application code from accessingthe local file system accessing shared memory using certainlibraries and arbitrarily spawning threads Therefore devel-opers must use the provided cloud SDK operations to im-plement program logic equivalent to the restricted featuresFor example the datastore interface can be used to read andwrite persistent data instead of using the local file systemand the memcache interface can be used in lieu of globalshared memory

Furthermore the only way for a web API to execute isin response to an HTTPS request or as a background taskTherefore execution of all web API operations start and endat well defined program points and we are able to infer thisstructure from common software patterns Also concurrencyis restricted by capping the number of threads and requir-ing that a thread cannot outlive the request that creates itFinally PaaS clouds enforce quotas and limits on service(cloud SDK) use [16 17 28] App Engine for examplerequires that all web API requests complete under 60 sec-onds Otherwise they are terminated by the platform Suchenforcement places a strict upper bound on the execution ofa web API operation

To understand the specific characteristics of PaaS-hostedweb APIs and the potential of this restricted domain to fa-cilitate efficient static analysis and response time predictionwe next summarize results from static analysis (using theSoot framework [43]) of 35 real world App Engine webAPIs These web APIs are open source written in Java andrun over Google App Engine or AppScale without modifica-tion We plan to make these applications publicly availableupon publication

3

Figure 2 CDF of the number of static paths through meth-ods in the surveyed web APIs

Our analysis detected a total of 1458 Java methods in theanalyzed codes Figure 2 shows the cumulative distributionof static program paths in these methods Approximately97 of the methods considered in the analysis have 10or fewer static program paths through them 99 of themethods have 36 or fewer paths However the CDF is heavytailed and grows to 34992 We truncate the graph at 100paths for clarity As such only a very small number ofmethods each contains a large number of paths Fortunatelyover 65 of the methods have exactly 1 path (ie there areno branches)

Next we consider the looping behavior of web APIs1286 of the methods (88) considered in the study do nothave any loops 172 methods (12) contain loops We be-lieve that this characteristic is due to the fact that the PaaSSDK and the platform restrictions like quotas and responsetime limits discourage looping

Approximately 29 of all the loops in the analyzed pro-grams do not contain any cloud SDK invocations A major-ity of the loops (61) however are used to iterate over adataset that is returned from the datastore cloud SDK inter-face of App Engine (ie iterating on the result set returned bya datastore query) We refer to this particular type of loopsas iterative datastore reads

Table 1 lists the number of times each cloud SDK inter-face is called across all paths and methods in the analyzedprograms The Datastore API is the most commonly usedinterface This is because data management is fundamentalto most web APIs and the PaaS disallows using the localfilesystem to do so for scalability and portability reasons

Next we explore the number of cloud SDK calls madealong different paths of execution in the web APIs For thisstudy we consider all paths of execution through the meth-ods (64780 total paths) Figure 3 shows the cumulative dis-tribution of the number of SDK calls within paths Approx-imately 98 of the paths have 1 cloud SDK call or fewerThe probability of finding an execution path with more than5 cloud SDK calls is smaller than 1

Table 1 Static cloud SDK calls in surveyed web APIsCloud SDK Interface No of Invocations

blobstore 7channel 1

datastore 735files 4

images 3memcache 12

search 6taskqueue 24

tools 2urlfetch 8

users 44xmpp 3

Figure 3 CDF of cloud SDK call counts in paths of execu-tion

Finally our experience with App Engine web APIs indi-cates that a significant portion of the total time of a method(web API operation) is spent in cloud SDK calls Confirmingthis hypothesis requires careful instrumentation (ie difficultto automate) of the web API codes We performed such a testby hand on two representative applications and found thatthe time spent in code other than cloud SDK calls accountsfor 0-6 of the total time (0-3ms for a 30-50ms web APIoperation)

This study of various characteristics typical of PaaS-hosted web APIs indicates that there may be opportunitiesto exploit the specific aspects of this application domain tosimplify analysis and to facilitate performance prediction Inparticular operations in these applications are short have asmall number of paths to analyze implement few loops andinvoke a small number of cloud SDK calls Moreover mostof the time spent executing these operations results fromcloud SDK invocations In the next section we describeour design and implementation of Cerebro that takes advan-tage of these characteristics and assumptions We then usea Cerebro prototype to experimentally evaluate its efficacyfor estimating the worst-case response time for applicationsfrom this domain

4

3 CerebroGiven the restricted application domain of PaaS-hosted webAPIs we believe that it is possible to design a system thatpredicts response time SLAs for them using only static in-formation from the web API code itself To enable this wedesign Cerebro with three primary components

bull A static analysis tool that extracts sequences of cloudSDK operations for each path through a method (webAPI operation)

bull A monitoring agent that runs in the target PaaS andefficiently monitors the performance of the underlyingcloud SDK operations and

bull An SLA predictor that uses the outputs of these twocomponents to accurately predict an upper bound on theresponse time of the web API

We overview each of these components in the subsectionsthat follow and then discuss the Cerebro workflow with anexample

31 Static AnalysisThis component analyzes the source code of the web API(or some intermediate representation of it) and extracts a se-quence of cloud SDK operations We implement our analysisfor Java bytecode programs using the Soot framework [43]Currently our prototype analyzer considers the followingJava codes as exposed web APIs

bull classes that extend the javaxservletHttpServlet class (ieJava servlet implementations)

bull classes that contain JAX-RS Path annotations andbull any other classes explicitly specified by the developer in

a special configuration file

Cerebro performs a simple construction and interproce-dural static analysis of control flow graph (CFG) [1 2 2930] for each web API operation The algorithm extracts allcloud SDK operations along each path through the methodsCerebro analyzes other functions that the method calls re-cursively Cerebro caches cloud SDK details for each func-tion once analyzed so that it can be reused efficiently forother call sites to the same function Cerebro does not ana-lyze third-party library calls if any (which in our experiencetypically do not contain cloud SDK calls) Cerebro encodeseach cloud SDK call sequence for each path in a lookup ta-ble We identify cloud SDK calls by their Java package name(eg comgoogleappengineapis)

To handle loops we first extract them from the CFG andannotate all cloud SDK invocations that occur within themWe annotate each such SDK invocation with an estimate onthe number of times the loop is likely to execute in the worstcase We estimate loop bounds using a loop bound predictionalgorithm based on abstract interpretation [8]

As shown in the previous section loops in these programsare rare and when they do occur they are used to iterate overa dataset returned from a database For such data-dependentloops we estimate the bounds if specified in the cloud SDKcall (eg the maximum number of entities to return [18]) Ifour analysis is unable to estimate the bounds for these loopsCerebro prompts the developer for an estimate of the likelydataset size andor loop bounds

32 PaaS Monitoring AgentCerebro monitors and records the response time of individ-ual cloud SDK operations within a running PaaS systemSuch support can be implemented as a PaaS-native featureor as a PaaS application (web API) we use the latter inour prototype The monitoring agent runs in the backgroundwith but separate from other PaaS-hosted web APIs Theagent invokes cloud SDK operations periodically on syn-thetic datasets and records timestamped response times inthe PaaS datastore for each cloud SDK operation Finallythe agent periodically reclaims old measurement data toeliminate unnecessary storage The Cerebro monitoring andreclamation rates are configurable and monitoring bench-marks can be added and customized easily to capture com-mon PaaS-hosted web API coding patterns

In our prototype the agent monitors the datastore andmemcache SDK interfaces every 60 seconds In addition itbenchmarks loop iteration over datastore entities to capturethe performance of iterative datastore reads for datastoreresult set sizes of 10 100 and 1000 We limit ourselves tothese values because the PaaS requires that all operationscomplete (respond) within 60 seconds ndash so the data sizesreturned are typically small

33 Making SLA PredictionsTo make SLA predictions Cerebro uses Queue Bounds Es-timation from Time Series (QBETS) [33] a non-parametrictime series analysis method that we developed in prior workWe originally designed QBETS for predicting the schedul-ing delays for the batch queue systems used in high perfor-mance computing environments but it has proved effectivein other settings where forecasts from arbitrary times seriesare needed [7 32 46] In particular it is both non-parametricand it automatically adapts to changes in the underlying timeseries dynamics making it useful in settings where forecastsare required from arbitrary data with widely varying charac-teristics

A QBETS analysis requires three inputs

1 A time series of data generated by a continuous experi-ment

2 The percentile for which an upper bound should be pre-dicted (p isin [199])

3 The upper confidence level of the prediction (c isin (01))

5

QBETS uses this information to predict an upper boundfor the pth percentile of the time series It does so by treat-ing each observation in the time series as a Bernoulli trialwith probability 001p of success Let q = 001p If thereare n observations the probability of there being exactly ksuccesses is described by a Binomial distribution (assumingobservation independence) having parameters n and q If Qis the pth percentile of the distribution from which the obser-vations have been drawn the equation

1minusk

sumj=0

(nj

)middot (1minusq) j middotqnminus j (1)

gives the probability that more than k observations aregreater than Q As a result the kth largest value in a sortedlist of n observations gives an upper c confidence bound onQ when k is the smallest integer value for which Equation 1is larger than c

More succinctly QBETS sorts observations in a historyof observations and computes the value of k that constitutesan index into this sorted list that is the upper c confidencebound on the pth percentile The methodology assumes thatthe time series of observations is ergodic so that in the longrun the confidence bounds are accurate

QBETS also attempts to detect change points in the timeseries of observations so that it can apply this inferencetechnique to only the most recent segment of the series thatappears to be stationary To do so it compares percentilebounds predictions with observations throughout the seriesand determines where the series is likely to have undergonea change It then discards observations from the series priorto this change point and continues As a result when QBETSstarts it must ldquolearnrdquo the series by scanning it in time seriesorder to determine the change points We report Cerebrolearning time in our empirical evaluation in Subsection 44

Note that c is an upper confidence level on pth percentilewhich makes the QBETS bound estimates conservativeThat is the value returned by QBETS as a bound predic-tion is larger than the true pth percentile with probability1minus c under the assumptions of the QBETS model In thisstudy we use the 95th percentile and c = 001

Note that the algorithm itself can be implemented effi-ciently so that it is suitable for on-line use Details of thisimplementation as well as a fuller accounting of the statisti-cal properties and assumptions are available in [7 31ndash33]

QBETS requires a sufficiently large number of datapoints in the input time series before it can make an ac-curate prediction Specifically the largest value in a sortedlist of n observations is greater than the pth percentile withconfidence c when n gt= log(c)log(001p)

For example predicting the 95th percentile of the APIexecution time with an upper confidence of 001 requiresat least 90 observations We use this information to controlreclamation of monitoring data by PaaS agent

Figure 4 Cerebro architecture and component interactions

34 Example Cerebro WorkflowFigure 4 illustrates how the Cerebro components interactwith each other during the prediction making process Cere-bro can be invoked when a web API is deployed to a PaaScloud or at any time during the development process to givedevelopers insight into the worst-case response time of theirapplications

Upon invoking Cerebro with a web API code Cerebroperforms its static analysis on all operations in the API Foreach analyzed operation it produces a list of annotated cloudSDK invocation sequences ndash one sequence per program pathCerebro then prunes this list to eliminate duplicates Dupli-cates occur when a web API operation has multiple programpaths with the same sequence of cloud SDK invocationsNext for each pruned list Cerebro performs the followingoperations

1 Retrieve (possibly compressed) benchmarking data fromthe monitoring agent for all SDK operations in each se-quence The agent returns ordered time series data (onetime series per cloud SDK operation)

2 Align retrieved time series across operations in time andsum the aligned values to form a single joint time seriesof the summed values for the sequence of cloud SDKoperations

3 Run QBETS on the joint time series with the desired pand c values to predict an upper bound

Cerebro uses largest predicted value (across path sequences)as its SLA prediction for a web API operation This process(SLA prediction) can be implemented as a co-located servicein the PaaS cloud or as a standalone utility We do the latterin our prototype

As an example suppose that the static analysis results inthe cloud SDK invocation sequence lt op1op2op3 gt forsome operation in a web API Assume that the monitoring

6

agent has collected the following time series for the threeSDK operations

bull op1 [t0 5 t1 4 t2 6 tn 5]bull op2 [t0 22 t1 20 t2 21 tn 21]bull op3 [t0 7 t1 7 t2 8 tn 7]

Here tm is the time at which the mth measurement is takenCerebro aligns the three time series according to timestampsand sums the values to obtain the following joint time series[t0 34 t1 31 t2 35 tn 33]

If any operation is tagged as being inside a loop wherethe loop bounds have been estimated Cerebro multipliesthe time series data corresponding to that operation by theloop bound estimate before aggregating In cases where theoperation is inside a data-dependent loop we request thetime series data from the monitoring agent for its iterativedatastore read benchmark for a number of entities that isequal to or larger than the annotation and include it in thejoint time series

Cerebro passes the final joint time series for each se-quence of operations to QBETS which returns the worst-case upper bound response time it predicts If the QBETSpredicted value is Q milliseconds Cerebro forms the SLA asldquothe web API will respond in under Q milliseconds p ofthe timerdquo When the web API has multiple operations Cere-bro estimates multiple SLAs for the API If a single valueis needed for the entire API regardless of operation Cerebroreturns the largest predicted value as the final SLA (ie theworst-case SLA for the API)

4 Experimental ResultsTo empirically evaluate Cerebro we conduct experimentsusing five open source Google App Engine applications

StudentInfo RESTful (JAX-RS) application for managingstudents of a class (adding removing and listing studentinformation)

ServerHealth Monitors computes and reports statistics forserver uptime for a given web URL

SocialMapper A simple social networking application withAPIs for adding users and comments

StockTrader A stock trading application that provides APIsfor adding users registering companies buying and sell-ing stocks among users

Rooms A hotel booking application with APIs for register-ing hotels and querying available rooms

These web APIs use the datastore cloud SDK interfaceextensively The Rooms web API also uses the memcacheinterface We focus on these two interfaces exclusively inthis study We execute these applications in the Google AppEngine public cloud (SDK v1917) and in an AppScale(v20) private cloud We instrument the programs to collectexecution time statistics for verification purposes only (the

instrumentation data is not used to predict the SLAs) TheAppScale private cloud used for testing was hosted usingfour ldquom32xlargerdquo virtual machines running on a privateEucalyptus [34] cloud

We first report the time required for Cerebro to performits analysis and SLA prediction Across web APIs Cerebrotakes 1000 seconds on average with a maximum time of1425 seconds for StudentInfo application These times in-clude the time taken by the static analyzer to analyze all theweb API operations and the time taken by QBETS to makepredictions For these results the length of the time seriescollected by PaaS monitoring agent is 1528 data points (255hours of monitoring data) Since the QBETS analysis timedepends on the length of the input time series we also mea-sured the time for 2 weeks of monitoring data (19322 datapoints) to provide some insight into the overhead of SLAprediction Even in this case Cerebro requires only 57405seconds (96 minutes) on average

41 Correctness of PredictionsWe first evaluate the correctness of Cerebro predictions Aset of predictions is correct if the fraction of measured re-sponse time values that fall below the Cerebro prediction isgreater than or equal to the SLA target probability For ex-ample if the SLA probability is 095 (ie p = 95 in QBETS)for a specific web API then the Cerebro predictions are cor-rect if at least 95 of the response times measured for theweb API are smaller than their corresponding Cerebro pre-dictions

We benchmark each web API for a period of 15 to 20hours During this time we run a remote HTTP client thatmakes requests to the web APIs once every minute The ap-plication instrumentation measures and records the responsetime of the API operation for each request (ie within theapplication) Concurrently and within the same PaaS sys-tem we execute the Cerebro PaaS monitoring agent whichis an independently hosted application within the cloud thatbenchmarks each SDK operation once every minute

Cerebro predicts the web API execution times using onlythe cloud SDK benchmarking data collected by CerebrorsquosPaaS monitoring agent We configure Cerebro to predict anupper bound for the 95th percentile of the web API responsetime with an upper confidence of 001

QBETS generates a prediction for every value in the inputtime series (one per minute) Cerebro reports the last one asthe SLA prediction to the user or PaaS administrator in pro-duction However having per-minute predictions enables usto compare these predictions against actual web API execu-tion times measured during the same time period to eval-uate Cerebro correctness More specifically we associatewith each measurement the prediction from the predictiontime series that most nearly precedes it in time The correct-ness fraction is computed from a sample of 1000 prediction-measurement pairs

7

Figure 5 Cerebro correctness percentage in Google AppEngine and AppScale cloud platforms

Figure 5 shows the final results of this experiment Eachof the columns in Figure 5 corresponds to a single web APIoperation in one of the sample applications The columns arelabeled in the form of ApplicationNameOperationName aconvention we will continue to use in the rest of the paper

Since we are using Cerebro to predict the 95th percentileof the API response times Cerebrorsquos predictions are correctwhen at least 95 of the measured response times are lessthan their corresponding predicted upper bounds Accordingto Figure 5 Cerebro achieves this goal for all the applica-tions in both cloud environments

The web API operations illustrated in Figure 5 cover awide spectrum of scenarios that may be encountered in realworld StudentInfogetStudent and StudentInfoaddStudentare by far the simplest operations in the mix They invokea single cloud SDK operation each and perform a simpledatastore read and a simple datastore write respectively Asper our survey results these alone cover a significant por-tion of the web APIs developed for the App Engine andAppScale cloud platforms (1 path through the code and1 cloud SDK call) The StudentInfodeleteStudent opera-tion makes two cloud SDK operations in sequence whereasStudentInfogetAllStudents performs an iterative datastoreread In our experiment with StudentInfogetAllStudentswe had the datastore preloaded with 1000 student recordsand Cerebro was configured to use a maximum entity countof 1000 when making predictions

ServerHealthinfo invokes the same cloud SDK opera-tion three times in sequence Both StockTraderbuy andStockTradersell have multiple paths through the applica-tion (due to branching) thus causing Cerebro to make mul-tiple sequences of predictions ndash one sequence per path Theresults shown in Figure 5 are for the longest paths whichconsist of seven cloud SDK invocations each According toour survey 998 of the execution paths found in GoogleApp Engine applications have seven or fewer cloud SDKcalls in them Therefore we believe that the StockTrader webAPI represents an important upper bound case

Figure 6 Average difference between predictions and ac-tual response times in Google App Engine and AppScaleThe y-axis is in log scale

RoomsgetRoomByName invokes two different cloudSDK interfaces namely datastore and memcache Roomsget-AllRooms is another operation that consists of an iterativedatastore read In this case we had the datastore preloadedwith 10 entities and Cerebro was configured to use a maxi-mum entity count of 10

42 Tightness of PredictionsIn this section we discuss the tightness of the predictionsgenerated by Cerebro Tightness is a measure of how closelythe predictions bound the actual response times of the webAPIs Note that it is possible to perfectly achieve the correct-ness goal by simply predicting overly large values for webAPI response times For example if Cerebro were to pre-dict a response time of several years for exactly 95 of theweb API invocations and zero for the others it would likelyachieve a correctness percentage of 95 From a practicalperspective however such an extreme upper bound is notuseful as the basis for an SLA

Figure 6 depicts the average difference between predictedresponse time bounds and actual response times for our sam-ple web APIs when running in the App Engine and AppScaleclouds These results were obtained considering a sequenceof 1000 consecutive predictions (of 95th percentile) and theaverages are computed only for correct predictions (ie onesabove their corresponding measurements)

According to Figure 6 Cerebro generates fairly tight SLApredictions for most web API operations considered in theexperiments In fact 14 out of the 20 cases illustrated inthe figure show average difference values less than 65msIn a few cases however the bounds differ from the averagemeasurement substantiallybull StudentInfogetAllStudents on both cloud platformsbull ServerHealthinfo SocialMapperaddComment Stock-

Traderbuy and StockTradersell on App EngineFigure 7 shows the empirical cumulative distribution

function (CDF) of measured execution times for the Stu-dentInfogetAllStudents on Google App Engine (one of the

8

Figure 7 CDF of measured executions times of the Stu-dentInfogetAllStudents operation on App Engine

extreme cases) This distribution was obtained by consider-ing the applicationrsquos instrumentation results gathered withina window of 1000 minutes The average of this sample is343179ms and the 95th percentile from the CDF is 4739msThus taken as a distribution the ldquospreadrdquo between the aver-age and the 95th percentile is more than 1300ms

From this it becomes evident that StudentInfogetAll-Students records very high execution times frequently Inorder to incorporate such high outliers Cerebro must beconservative and predict large values for the 95th percentileThis is a required feature to ensure that 95 or more APIinvocations have execution times under the predicted SLABut as a consequence the average distance between themeasurements and the predictions increases significantly

We omit a similar analysis of the other cases in the in-terest of brevity but summarize the tightness results as in-dicating that Cerebro achieves a bound that is ldquotightrdquo withrespect to the percentiles observed by sampling the seriesfor long periods

Another interesting observation we can make regardingthe tightness of predictions is that the predictions made in theAppScale cloud platform are significantly tighter than theones made in Google App Engine (Figure 6) For nine outof the ten operations tested Cerebro has generated tighterpredictions in the AppScale environment This is becauseweb API performance on AppScale is far more stable andpredictable thus resulting in fewer measurements that occurfar from the average

The reason why AppScalersquos performance is more stableover time is because it is deployed on a set of closely con-trolled and monitored cluster of virtual machines (VMs) thatuse a private Infrastructure-as-a-Service (IaaS) cloud to im-plement isolation In particular the VMs assigned to App-Scale do not share nodes with ldquonoisy neighborsrdquo in our testenvironment In contrast Google App Engine does not ex-pose the performance characteristics of its multi-tenancyWhile it operates at vastly greater scale our test applicationsalso exhibit wider variance of web API response time whenusing it Cerebro however is able to predict a correct and

tight SLAs for applications running in either platform thelower variance private AppScale PaaS and the extreme scalebut more varying Google App Engine PaaS

43 Duration of Prediction ValidityTo be of practical value to PaaS administration the dura-tion over which a Cerebro prediction remains valid must belong enough to allow appropriate remedial action when loadconditions change and the SLA is in danger of being vio-lated In particular SLAs must remain correct for at least thetime necessary to allow human responses to changing condi-tions such as the commitment of more resources to web APIsthat are in violation or alerts to support staff that customersmay be calling to claim SLA breach Ideally each predictionshould persist as correct for several hours or more to matchstaff response time to potential SLA violations

However determining when a Cerebro-predicted SLAbecomes invalid is potentially complex For example giventhe definition of correctness described in Subsection 41 itis possible to report an SLA violation when the running tab-ulation of correctness percentage falls below the target prob-ability (when expressed as a percentage) However if thismetric is used and Cerebro is correct for many consecutivemeasurements a sudden change in conditions that causesthe response time to persist at a higher level will not im-mediately trigger a violation For example Cerebro mightbe correct for several consecutive months and then incorrectfor several consecutive weeks before the overall correctnesspercentage drops below 95 and a violation is detected Ifthe SLA is measured over a year such time scales may beacceptable but we believe that PaaS administrators wouldconsider such a long period of time where the SLAs werecontinuously in violation unacceptable Thus we propose amore conservative approach to measuring the duration overwhich a prediction remains valid than simply measuring thetime until the correctness percentage drops below the SLA-specified value

Suppose at time t Cerebro predicts value Q as the p-thpercentile of some APIrsquos execution time If Q is a correctand tight prediction the probability of APIrsquos next measuredresponse time being greater than Q is 1minus (001p) If thetime series consists of independent measurements then theprobability of seeing n consecutive values greater than Q(due to random chance) is (1minus001p)n For example usingthe 95th percentile the probability of seeing 3 values in arow larger than the actual percentile (purely due to randomchance) is (005)3 = 000012 or about 1 in 8333

This calculation is conservative with respect to autocor-relation That is if the time series is stationary but auto-correlated then the number of consecutive values above the95th percentile that correspond to a probability of 000012is larger than 3 For example in previous work [33] usingan artificially generated AR(1) series we observed that 5consecutive values above the 95th percentile occurred withprobability 000012 when the first autocorrelation was 05

9

Table 2 Prediction validity period distributions of differ-ent operations in App Engine Validity durations were com-puted by observing 3 consecutive SLA violations 5th and95th columns represent the 5th and 95th percentiles of thedistributions respectively All values are in hours

Operation 5th Average 95th

StudentInfogetStudent 715 7072 13443StudentInfodeleteStudent 255 3797 9437StudentInfoaddStudent 145 268 6478

ServerHealthinfo 141 3922 11771RoomsgetRoomByName 724 7047 13336RoomsgetRoomsInCity 208 3012 8258

Table 3 Prediction validity period distributions of differ-ent operations in AppScale Validity periods were computedby observing 3 consecutive SLA violations 5th and 95th

columns represent the 5th and 95th percentiles of the dis-tributions respectively All values are in hours




and 14 when the first autocorrelation was 085 QBETS usesa look-up table of these values to determine the number ofconsecutive measurements above Q that constitute a ldquorareeventrdquo indicating a possible change in conditions

Each time Cerebro makes a new prediction it computesthe current autocorrelation and uses the QBETS rare-eventlook-up table to determine n the number of consecutivevalues that constitute a rare event We measure the timefrom when Cerebro makes the prediction until we observen consecutive measurement values above that prediction asbeing the time duration over which the prediction is validWe refer to this duration as the validity duration Tables 2and 3 present these durations for Cerebro predictions inGoogle App Engine and AppScale respectively

From Table 2 the average validity duration for all 6 op-erations considered in App Engine is longer than 24 hoursThe lowest average value observed is 268 hours and thatis for the StudentInfoaddStudent operation If we just con-sider the 5th percentiles of the distributions they are alsolonger than 1 hour The smallest 5th percentile value of 141hours is given by the ServerHealthinfo operation This re-sult implies that based on our conservative model for de-tecting SLA violations Cerebro predictions made on Google

App Engine would be valid for at least 141 hours or moreat least 95 of the time

By comparing the distributions for different operationswe can conclude that API operations that perform a sin-gle basic datastore or memcache read tend to have longervalidity durations In other words those cloud SDK op-erations have fairly stable performance characteristics inGoogle App Engine This is reflected in the 5th percentilesof StudentInfogetStudent and RoomsgetRoomByNameAlternatively operations that execute writes iterative readsor long sequences of cloud SDK operations have shorterprediction validity durations

For AppScale the smallest average validity period of3377 hours is observed from the RoomsgetRoomsInCityoperation All other operations tested in AppScale have av-erage prediction validity periods greater than 54 hours Thelowest 5th percentile value in the distributions which is195 hours is also shown by RoomsgetRoomsInCity Thismeans the SLAs predicted for AppScale would hold correctfor at least 195 hours or more at least 95 of the timeThe relatively smaller validity period values computed forthe RoomsgetRoomsInCity operation indicates that the per-formance of iterative datastore reads is subject to some vari-ability in AppScale

44 Effectiveness of QBETSIn order to gauge the effectiveness of QBETS we compare itto a ldquonaıverdquo approach that simply uses the running empiricalpercentile tabulation of a given joint time series as a predic-tion This simple predictor retains a sorted list of previousobservations and predicts the p-th percentile to be the valuethat is larger than p of the values in the observation his-tory Whenever a new observation is available it is added tothe history and each prediction uses the full history

Figure 8 shows the correctness measurements for the sim-ple predictor using the same cloud SDK monitoring dataand application benchmarking data that was used in Sub-section 41 That is we keep the rest of Cerebro unchangedswap QBETS out for the simple predictor and run the sameset of experiments using the logged observations Thus theresults in Figure 8 are directly comparable to Figure 5 whereCerebro uses QBETS as a forecaster

For the simple predictor Figure 8 shows lower correct-ness percentages compared to Figure 5 for QBETS (ie thesimple predictor is less conservative) However in severalcases the simple predictor falls well short of the target cor-rectness of 95 necessary for the SLA That is it is unableto furnish a prediction correctness that can be used as thebasis of an SLA in all of the test cases

To illustrate why the simple predictor fails to meet thedesired correctness level Figure 9 shows the time series ofobservations simple predictor forecasts and QBETS fore-casts for the RoomsgetRoomsInCity operation on GoogleApp Engine (the case in Figure 8 that shows lowest correct-ness percentage)

10

Figure 8 Cerebro correctness percentage resulting from thesimple predictor (without QBETS)

Figure 9 Comparison of predicted and actual responsetimes of RoomsgetRoomsInCity on Google App Engine

In this experiment there are a significant number ofresponse time measurements that violate the SLA givenby simple predictor (ie are larger than the predicted per-centile) but are below the corresponding QBETS predic-tion made for the same observation Notice also that whileQBETS is more conservative (its predictions are generallylarger than those made by the simple predictor) in this casethe predictions are typically only 10 larger That is whilethe simple predictor shows the 95th percentile to be ap-proximately 40ms the QBETS predictions vary between42ms and 48ms except at the beginning where QBETS isldquolearningrdquo the series This difference in prediction how-ever results in a large difference in correctness percentageFor QBETS the correctness percentage is 974 (Figure 5)compared to 755 for the simple predictor (Figure 8)

5 Related WorkOur research leverages a number of mature research areasin computer science and mathematics These areas includestatic program analysis cloud computing time series analy-sis and SOA governance

The problem of predicting execution time SLAs of webAPIs is similar to worst-case execution time (WCET) anal-ysis [12 14 30 37 45] The objective of WCET analysis

is to determine the maximum execution time of a softwarecomponent in a given hardware platform It is typically dis-cussed in the context of real-time systems where the de-velopers should be able to document and enforce precisehard real-time constraints on the execution time of programsIn order to save time manpower and hardware resourcesWCET analysis solutions are generally designed favoringstatic analysis methods over software testing We share sim-ilar concerns with regard to cloud platforms and strive toeliminate software testing in the favor of static analysis

Ermedahl et al designed and implemented SWEET [12]a WCET analysis tool that make use of program slicing [37]abstract interpretation [10] and invariant analysis [30] to de-termine the loop bounds and worst-case execution time of aprogram Program slicing helps to reduce the amount of codeand program states that need to be analyzed by SWEET Thisis similar to our idea of extracting just the cloud SDK invo-cations form a given web API code SWEET uses abstractinterpretation in interval and congruence domains to identifythe set of values that can be assigned to key control variablesof a program These sets are then used to calculate exact loopbounds for most data-independent loops in the code Invari-ant analysis is used to detect variables that do not changeduring the course of a loop iteration and remove them fromthe analysis thus further simplifying the loop bound estima-tion Lokuceijewski et al propose a similar WCET analysisusing program slicing and abstract interpretation [25] Theyadditionally use a technique called polytope models to speedup the analysis

The corpus of research that covers the use of static analy-sis methods to estimate the execution time of software appli-cations is extensive Gulwani Jain and Koskinen used twotechniques named control-flow refinement and progress in-variants to estimate the bounds for procedures with nestedand multi-path loops [19] Gulwani Mehra and Chilimbiproposed SPEED [20] a system that computes symbolicbounds for programs This system makes use of user-definedquantitative functions to predict the bounds for loops iterat-ing over data structures like lists trees and vectors Our ideaof using user-defined values to bound data-dependent loops(eg iterative datastore reads) is partly inspired by this con-cept Bygde [8] proposed a set of algorithms for predictingdata-independent loops using abstract interpretation and el-ement counting (a technique that was partly used in [12])Cerebro incorporates minor variations of these algorithmssuccessfully due to their simplicity

Cerebro makes use of and is similar to many of the execu-tion time analysis systems discussed above However thereare also several key differences For instance Cerebro is fo-cused on solving the execution time prediction problem forPaaS-hosted web APIs As we show in our characterizationsurvey such applications have a set of unique properties thatcan be used to greatly simplify static analysis Also Cerebrois designed to only work with web API codes This makes

11

designing a solution much more simpler but less general Tohandle the highly variable and evolving nature of cloud plat-forms Cerebro combines static analysis with runtime moni-toring of cloud platforms at the level of SDK operations Noother system provides such a hybrid approach to the best ofour knowledge Finally we use time series analysis [33] topredict API execution time upper bounds with specific con-fidence levels

SLA management on service-oriented systems and cloudsystems has been throughly researched over the years How-ever a lot of the existing work has focused on issues suchas SLA monitoring [6 27 36 42] SLA negotiation [26 4748] and SLA modeling [9 39 40] Some work has lookedat incorporating a given SLA to the design of a system andthen monitoring it at the runtime to ensure SLA compliantbehavior [21] Our research takes a different approach fromsuch works whereby it attempts to predict the performanceSLAs for a given web API To the best of our knowledgeCerebro is the first system to predict performance SLAs forweb APIs developed for PaaS clouds

A work that is similar to ours has been proposed byArdagna Damiani and Sagbo in [4] The authors developa system for early estimation of service performance basedon simulations Given a STS model (Symbolic TransitionsSystem) of a service their system is able to generate a simu-lation script which can be used to assess the performance ofthe service STS models are a type of finite state automataFurther they use probabilistic distributions with fixed pa-rameters to represent the delays incurred by various opera-tions in the service Cerebro is easier to use than this systembecause we do not require API developers to construct anymodels of the web APIs Also instead of using probabilisticdistributions with fixed parameters Cerebro uses actual his-torical performance metrics of cloud SDK operations Thisenables Cerebro to generate more accurate results that re-flect the dynamic nature of the cloud platform

There has also been prior work in the area of predictingSLA violations [11 24 41] These systems take an existingSLA and historical performance data of a service and pre-dict when the service might violate the given SLA in the fu-ture Cerebrorsquos notion of prediction validity period has somerelation to this line of research However Cerebrorsquos maingoal is to make SLA predictions for web APIs before theyare deployed and executed We believe that some of theseexisting SLA violation predictors can complement our workby providing API developers and cloud administrators in-sights on when a Cerebro-predicted SLA will be violated

6 Conclusions and Future WorkWeb services and service oriented architecture encouragedevelopers to create new applications by composing existingservices via their web APIs But such integration of servicesmakes it difficult to reason about the performance and othernon-functional properties of composite applications To fa-

cilitate such reasoning web APIs should be exported for useby applications with strong performance guarantees (SLAs)

To this end we propose Cerebro a system that predictsresponse time bounds for web APIs deployed in PaaS cloudsWe choose PaaS clouds as the target environment of our re-search due to their rapidly growing popularity as a technol-ogy for hosting scalable web APIs and their SDK-based re-stricted application development model which makes it eas-ier to analyze PaaS applications statically

Cerebro uses static analysis to extract the sequence ofcloud SDK calls made by a given web API code combinedwith historical performance measurements of cloud SDKcalls to predict the response time of the web API It employsQBETS a non-parametric time series analysis and forecast-ing method to analyze cloud SDK performance data andpredict bounds on response time that can be used as sta-tistical ldquoguaranteesrdquo with associated guarantee probabilitiesCerebro is intended for use both during development and de-ployment phases of a web API and precludes the need forcontinuous performance testing of the API code Further itdoes not interfere with the run-time operation (ie it requiresno application instrumentation at runtime) which makes itscalable

We have implemented a prototype of Cerebro for GoogleApp Engine public PaaS and AppScale private PaaS andevaluate it using a set of representative and open sourceweb applications developed by others Our findings indicatethat the prototype can determine response time levels thatcorrespond to specific target SLAs These predictions arealso durable with average validity times varying betweenone and three days

In the current design Cerebrorsquos cloud SDK monitoringagent only monitors a predefined set of cloud SDK opera-tions In our future work we wish to explore the possibil-ity of making this component more dynamic so that it au-tomatically learns what operations to benchmark from theweb APIs deployed in the cloud We also plan to investi-gate further how to better handle data-dependent loops (it-erative datastore reads) for different workloads Further weplan to integrate Cerebro with EAGER our API governancesystem and policy engine for PaaS clouds so that PaaS ad-ministrators can enforce SLA-related policies on web APIsat deployment-time

AcknowledgmentsThis work is funded in part by NSF (CNS-0905237 CNS-1218808 ACI-0751315) amp NIH (1R01EB014877-01)

References[1] A V Aho R Sethi and J D Ullman Compilers Principles

Techniques and Tools Addison-Wesley Longman PublishingCo Inc Boston MA USA 1986 ISBN 0-201-10088-6

[2] F E Allen Control Flow Analysis In Symposium on Com-piler Optimization 1970

12

[3] Amazon AWS Amazon Web Services home page 2015httpawsamazoncom [Accessed March 2015]

[4] C Ardagna E Damiani and K Sagbo Early Assessment ofService Performance Based on Simulation In IEEE Interna-tional Conference on Services Computing (SCC) 2013

[5] Berkeley API Central Berkeley API Central 2015 https

developerberkeleyedu [Accessed March 2015]

[6] A Bertolino G De Angelis A Sabetta and S ElbaumScaling Up SLA Monitoring in Pervasive Environments InInternational Workshop on Engineering of Software Servicesfor Pervasive Environments 2007

[7] J Brevik D Nurmi and R Wolski Quantifying MachineAvailability in Networked and Desktop Grid Systems InProceedings of CCGrid04 April 2004

[8] S Bygde Static WCET analysis based on abstract interpre-tation and counting of elements PhD thesis Malardalen Uni-versity 2010

[9] T Chau V Muthusamy H-A Jacobsen E Litani A Chanand P Coulthard Automating SLA Modeling In Confer-ence of the Center for Advanced Studies on Collaborative Re-search Meeting of Minds 2008

[10] P Cousot and R Cousot Abstract Interpretation A Uni-fied Lattice Model for Static Analysis of Programs by Con-struction or Approximation of Fixpoints In ACM SIGACT-SIGPLAN Symposium on Principles of Programming Lan-guages 1977

[11] S Duan and S Babu Proactive Identification of PerformanceProblems In ACM SIGMOD International Conference onManagement of Data 2006

[12] A Ermedahl C Sandberg J Gustafsson S Bygde andB Lisper Loop Bound Analysis based on a Combination ofProgram Slicing Abstract Interpretation and Invariant Anal-ysis In WCET 2007

[13] R T Fielding Architectural Styles and the Design ofNetwork-based Software Architectures PhD thesis Univer-sity of California Irvine 2000 AAI9980887

[14] C Frost C S Jensen K S Luckow and B Thomsen WCETAnalysis of Java Bytecode Featuring Common Execution En-vironments In International Workshop on Java Technologiesfor Real-Time and Embedded Systems 2011

[15] Google App Engine App engine - run yourapplications on a fully managed paas 2015httpscloudgooglecomappenginerdquo [Accessed March2015]

[16] Google App Engine Java SandboxGoogle app engine java sandbox 2015httpscloudgooglecomappenginedocsjavaJava The sandboxrdquo[Accessed March 2015]

[17] Google Cloud SDK Service Quotas and Limits 2015httpscloudgooglecomappenginedocsquotas

[Accessed March 2015]

[18] Google Datastore Fetch Options 2015 httpscloud

googlecomappenginedocsjavajavadoccom

googleappengineapidatastoreFetchOptions


[19] S Gulwani S Jain and E Koskinen Control-flow Refine-ment and Progress Invariants for Bound Analysis In ACMSIGPLAN Conference on Programming Language Design andImplementation 2009

[20] S Gulwani K K Mehra and T Chilimbi SPEED Pre-cise and Efficient Static Estimation of Program ComputationalComplexity In ACM SIGPLAN-SIGACT Symposium on Prin-ciples of Programming Languages 2009

[21] H He Z Ma H Chen and W Shao Towards an SLA-DrivenCache Adjustment Approach for Applications on PaaS InAsia-Pacific Symposium on Internetware 2013

[22] IEEE APIs IEEE Xplore Search Gateway 2015 http

ieeexploreieeeorggateway [Accessed March 2015]

[23] H Jayathilaka C Krintz and R Wolski EAGERDeployment-time API Governance for Modern PaaS CloudsIn IC2E Workshop on the Future of PaaS 2015

[24] P Leitner B Wetzstein F Rosenberg A Michlmayr S Dust-dar and F Leymann Runtime Prediction of Service LevelAgreement Violations for Composite Services In A DanF Gittler and F Toumani editors Service-Oriented Comput-ing ICSOCServiceWave 2009 Workshops volume 6275 ofLecture Notes in Computer Science pages 176ndash186 SpringerBerlin Heidelberg 2010 ISBN 978-3-642-16131-5

[25] P Lokuciejewski D Cordes H Falk and P Marwedel A Fastand Precise Static Loop Analysis Based on Abstract Interpre-tation Program Slicing and Polytope Models In IEEEACMInternational Symposium on Code Generation and Optimiza-tion 2009

[26] K Mahbub and G Spanoudakis Proactive SLA Negotiationfor Service Based Systems Initial Implementation and Eval-uation Experience In IEEE International Conference on Ser-vices Computing 2011

[27] A Michlmayr F Rosenberg P Leitner and S Dustdar Com-prehensive QoS Monitoring of Web Services and Event-basedSLA Violation Detection In International Workshop on Mid-dleware for Service Oriented Computing 2009

[28] Microsoft Azure Cloud SDK Service Quotas andLimits 2015 httpazuremicrosoftcomen-

usdocumentationarticlesazure-subscription-

service-limitscloud-service-limits [AccessedMarch 2015]

[29] R Morgan Building an Optimizing Compiler Digital PressNewton MA USA 1998 ISBN 1-55558-179-X

[30] S S Muchnick Advanced Compiler Design and Implementa-tion Morgan Kaufmann Publishers Inc San Francisco CAUSA 1997 ISBN 1-55860-320-4

[31] D Nurmi R Wolski and J Brevik Model-Based CheckpointScheduling for Volatile Resource Environments In Proceed-ings of Cluster 2005 2004

[32] D Nurmi J Brevik and R Wolski Modeling Machine Avail-ability in Enterprise and Wide-area Distributed ComputingEnvironments In Proceedings of Europar 2005 2005

[33] D Nurmi J Brevik and R Wolski QBETS Queue BoundsEstimation from Time Series In International Conference onJob Scheduling Strategies for Parallel Processing 2008

13

[34] D Nurmi R Wolski C Grzegorczyk G Obertelli S SomanL Youseff and D Zagorodnov The Eucalyptus open-sourcecloud-computing system In IEEEACM International Sympo-sium on Cluster Computing and the Grid 2009

[35] ProgrammableWeb ProgrammableWeb 2015 httpwwwprogrammablewebcom [Accessed March 2015]

[36] F Raimondi J Skene and W Emmerich Efficient OnlineMonitoring of Web-service SLAs In ACM SIGSOFT Inter-national Symposium on Foundations of Software Engineering2008

[37] C Sandberg A Ermedahl J Gustafsson and B LisperFaster WCET Flow Analysis by Program Slicing In ACMSIGPLANSIGBED Conference on Language Compilers andTool Support for Embedded Systems 2006

[38] SearchCloudComputing 2015 http

searchcloudcomputingtechtargetcomfeature

Experts-forecast-the-2015-cloud-computing-

market [Accessed March 2015]

[39] J Skene D D Lamanna and W Emmerich Precise ServiceLevel Agreements In International Conference on SoftwareEngineering 2004

[40] K Stamou V Kantere J-H Morin and M Georgiou A SLAGraph Model for Data Services In International Workshop onCloud Data Management 2013

[41] B Tang and M Tang Bayesian Model-Based Predictionof Service Level Agreement Violations for Cloud ServicesIn Theoretical Aspects of Software Engineering Conference(TASE) 2014

[42] A K Tripathy and M R Patra Modeling and MonitoringSLA for Service Based Systems In International Conferenceon Intelligent Semantic Web-Services and Applications 2011

[43] R Vallee-Rai P Co E Gagnon L Hendren P Lam andV Sundaresan Soot A Java Bytecode Optimization Frame-work In CASCON First Decade High Impact Papers 2010

[44] White House APIs Agency Application Programming Inter-faces 2015 httpwwwwhitehousegovdigitalgovapis [Accessed March 2015]

[45] R Wilhelm J Engblom A Ermedahl N Holsti S ThesingD Whalley G Bernat C Ferdinand R HeckmannT Mitra F Mueller I Puaut P Puschner J Staschulatand P Stenstrom The Worst-case Execution-time Prob-lemampMdashOverview of Methods and Survey of Tools ACMTrans Embed Comput Syst 7(3) May 2008

[46] R Wolski and J Brevik QPRED Using Quantile Predic-tions to Improve Power Usage for Private Clouds TechnicalReport UCSB-CS-2014-06 Computer Science Department ofthe University of California Santa Barbara Santa BarbaraCA 93106 September 2014

[47] L Wu S Garg R Buyya C Chen and S Versteeg Auto-mated SLA Negotiation Framework for Cloud Computing InIEEEACM International Symposium on Cluster Cloud andGrid Computing 2013

[48] E Yaqub R Yahyapour P Wieder C Kotsokalis K Lu andA I Jehangiri Optimal negotiation of service level agree-ments for cloud-based services through autonomous agents

In IEEE International Conference on Services Computing2014

14

Introduction

Domain Characteristics and Assumptions

Cerebro

Static Analysis

PaaS Monitoring Agent

Making SLA Predictions

Example Cerebro Workflow

Experimental Results

Correctness of Predictions

Tightness of Predictions

Duration of Prediction Validity

Effectiveness of QBETS

Related Work

Conclusions and Future Work

will be maintained Cerebro uses a combination of staticanalysis of the hosted web APIs and runtime monitoring ofthe PaaS cloud (not the web APIs themselves) to determinewhat minimum response time guarantee can be made witha target probability specified by a PaaS administrator Thesecalculated SLAs enable developers to reason about the per-formance of the applications that consume the cloud-hostedweb APIs

Currently cloud computing systems such as AmazonWeb Services (AWS) [3] and Google App Engine (GAE) [15]offer reliability SLAs specifying the fraction of availabilityover a fixed time period for their services that users can ex-pect when they contract to use a service However they donot provide SLAs guaranteeing minimum levels of perfor-mance In contrast Cerebro predictions make it possible todetermine response time SLAs with probabilities specifiedby the cloud provider in a way that is scalable

Cerebro is designed to improve the use of both publicand private PaaS clouds which have emerged as popular ap-plication hosting venues [38] For example there are overfour million active GAE applications that can execute onGooglersquos public cloud or over AppScale an open sourceprivate cloud version of App Engine A PaaS cloud providesdevelopers with a collection of commonly used scalable ser-vices that the platform exports via APIs defined within asoftware development kit (cloud SDK) These services arefully managed and covered under the availability SLAs ofthe cloud platform For example services in GAE and App-Scale cloud SDK include a distributed NoSQL datastoretask management and data caching among others

Cerebro generates response time SLAs for API calls ex-ported by a web application developed using the servicesavailable within the PaaS For brevity in this work we willuse the term web API to refer to a web-accessible API ex-ported by an application hosted on a PaaS platform Furtherwe will use the term cloud SDK to refer to the APIs that aremaintained as part of the PaaS and available to all hosted ap-plications This enables us to differentiate the internal APIsof the PaaS from the APIs exported by the deployed appli-cations For example an application hosted in Google AppEngine might export one or more web APIs to its users whileleveraging the internal cloud SDK for the Google datastorethat is available as part of the Google App Engine PaaS

Cerebro uses static analysis to identify the cloud SDKinvocations that dominate the response time of web APIsIndependently Cerebro also maintains a running history ofcloud SDK response time performance It uses QBETS [33]ndash a forecasting methodology we have developed in priorwork for predicting bounds on ldquoill behavedrdquo univariate timeseries ndash to predict response time bounds on each cloud SDKinvocation made by the application It combines these pre-dictions dynamically for each static program path througha web API operation and returns the ldquoworst-caserdquo upperbound on the time necessary to complete the operation

Because service implementations and platform behaviorunder load change over time Cerebrorsquos predictions necessar-ily have a lifetime That is the predicted SLAs may becomeinvalid after some time As part of this paper we developa model for detecting such SLA invalidations We use thismodel to investigate the effective lifetime of Cerebro predic-tions When such changes occur Cerebro can be reinvokedto establish new SLAs for any deployed web API

We have implemented Cerebro for both the Google AppEngine public PaaS and the AppScale private PaaS Givenits modular design and this experience we believe that Cere-bro can be easily integrated into any PaaS system We useour prototype implementation to evaluate the accuracy ofCerebro as well as the tightness of the bounds it predicts (iethe difference between the predictions and the actual APIexecution times) To this end we carry out a range of exper-iments using App Engine applications that are available asopen source

We also detail the duration over which these predictionshold in both GAE and AppScale We find that Cerebro gen-erates correct SLAs (predictions that meet or exceed theirprobabilistic guarantees) and that these SLAs are valid overtime periods ranging from 14 hours to more than 24 hoursWe also find that the high variability of performance in pub-lic PaaS clouds due to their multi-tenancy and massive scalerequires that Cerebro be more conservative in its predictionsto achieve the desired level of correctness In comparisonCerebro is able to make much tighter SLA predictions forweb APIs hosted in private single tenant clouds

Because Cerebro provides this analysis statically it im-poses no run-time overhead on the applications themselvesIt requires no run-time instrumentation of application codeand it does not require any performance testing of the webAPIs Furthermore because the PaaS is scalable and SDKmonitoring data is shared across all Cerebro executions thecontinuous monitoring of the cloud SDK generates no dis-cernible load on the cloud platform Thus we believe Cere-bro is suitable for highly scalable cloud settings

Finally we have developed Cerebro for use with EA-GER (Enforced API Governance Engine for REST) [23]ndash an API governance system for PaaS clouds EAGER at-tempts to enforce governance policies at the deployment-time of cloud applications These governance policies arespecified by cloud administrators to ensure the reliable op-eration of the cloud and the deployed applications PaaSplatforms include an application deployment phase duringwhich the platform provisions resources for the applicationinstalls the application components and configures them touse the cloud SDKs EAGER injects a policy checking andenforcement step into this deployment workflow so that onlyapplications that are compliant with respect to site-specificpolicies are successfully deployed Cerebro allows PaaS ad-ministrators to define EAGER policies that allow an appli-cation to be deployed only when its web APIs meet a pre-

2













3










images 3memcache 12


tools 2urlfetch 8

users 44xmpp 3




4




















5


1minusk

sumj=0

(nj

















6



















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work














3










images 3memcache 12


tools 2urlfetch 8

users 44xmpp 3




4




















5


1minusk

sumj=0

(nj

















6



















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work











images 3memcache 12


tools 2urlfetch 8

users 44xmpp 3




4




















5


1minusk

sumj=0

(nj

















6



















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work





















5


1minusk

sumj=0

(nj

















6



















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work



1minusk

sumj=0

(nj

















6



















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work




















7













8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work














8












9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work













9




















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work





















10










11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work











11















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work
















12








































13




















14

Introduction


Cerebro

Static Analysis









Related Work









































13




















14

Introduction


Cerebro

Static Analysis









Related Work





















14

Introduction


Cerebro

Static Analysis









Related Work


Response Time Service Level Agreements for Cloud-hosted … · Response Time Service Level...

Documents

Transcript of Response Time Service Level Agreements for Cloud-hosted … · Response Time Service Level...