Feedback Control Architecture and Design Methodology …lu/papers/rtweb.pdf · 5 • Using system...

30
Chenyang Lu Tarek F. Abdelzaher John A. Stankovic Sang H. Son Abstract This paper presents the design and implementation of an adaptive Web server architecture to provide relative, absolute and hybrid connection delay guarantees for different service classes. The first contribution of this paper is the architecture based on feedback control loops that en- force desired connection delays via dynamic connection scheduling and process reallocation. The second contribution is the use of control theoretic techniques to model and design the feed- back loops with proven performance guarantees. In contrast with heuristics-based approaches that rely on laborious hand-tuning and testing iteration, the control theoretic approach enables systematic design of an adaptive Web server with established analytical methods. The design methodology includes using system identification to establish dynamic models for a Web server, and using the Root Locus method to design feedback controllers to satisfy performance specifi- cations. The adaptive architecture has been implemented by modifying an Apache Web server. Experimental results demonstrate that the adaptive server provides robust delay guarantees even when workload varies significantly. Properties of the adaptive Web server also include guaran- teed stability, and efficient convergence to desired performance set points in response to signifi- cant fluctuation in user populations. Keywords: Web server, Quality of Service, feedback control, proportional differentiated service, absolute delay guarantee, relative delay guarantee. * Lu ([email protected] ) is with Department of Computer Science and Engineering, Washington University in St. Louis. Abdelzaher, Stankovic, and Son are with Department of Computer Science, University of Virginia. This work was supported by NSF grants CCR-9901706 and EIA-9900895. This paper is an extension to a conference paper [23]. Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers *

Transcript of Feedback Control Architecture and Design Methodology …lu/papers/rtweb.pdf · 5 • Using system...

Chenyang Lu Tarek F. Abdelzaher John A. Stankovic Sang H. Son

Abstract

This paper presents the design and implementation of an adaptive Web server architecture to

provide relative, absolute and hybrid connection delay guarantees for different service classes.

The first contribution of this paper is the architecture based on feedback control loops that en-

force desired connection delays via dynamic connection scheduling and process reallocation.

The second contribution is the use of control theoretic techniques to model and design the feed-

back loops with proven performance guarantees. In contrast with heuristics-based approaches

that rely on laborious hand-tuning and testing iteration, the control theoretic approach enables

systematic design of an adaptive Web server with established analytical methods. The design

methodology includes using system identification to establish dynamic models for a Web server,

and using the Root Locus method to design feedback controllers to satisfy performance specifi-

cations. The adaptive architecture has been implemented by modifying an Apache Web server.

Experimental results demonstrate that the adaptive server provides robust delay guarantees even

when workload varies significantly. Properties of the adaptive Web server also include guaran-

teed stability, and efficient convergence to desired performance set points in response to signifi-

cant fluctuation in user populations.

Keywords: Web server, Quality of Service, feedback control, proportional differentiated service, absolute delay guarantee, relative delay guarantee.

* Lu ([email protected]) is with Department of Computer Science and Engineering, Washington University in St. Louis. Abdelzaher, Stankovic, and Son are with Department of Computer Science, University of Virginia. This work was supported by NSF grants CCR-9901706 and EIA-9900895. This paper is an extension to a conference paper [23].

Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers*

2

I. INTRODUCTION

The increasing diversity of applications supported by the World Wide Web and the increas-

ing popularity of time-critical Web-based applications (such as online trading) motivates build-

ing QoS-aware Web servers. Such servers customize their performance attributes depending on

the class of the served requests so that more important requests receive better service. From the

perspective of the requesting clients, the most visible service performance attribute is typically

the service delay. Different requests may have different tolerances to service delays. For exam-

ple, one can argue that stock trading requests should be served more promptly than information

requests. Similarly, interactive clients should be served more promptly than background software

agents such as Web crawlers and prefetching proxies. Some businesses may also want to provide

different service delays to different classes of customers (e.g., depending on their monthly fees).

Hence, in this paper, we provide a solution to support delay differentiation in Web servers.

Support for different classes of service on the Web (with special emphasis on server delay

differentiation) has been investigated in recent literature. In the simplest case, it is proposed that

differentiation should be made between two classes of clients; premium and basic. For example,

the authors of [18] proposed and evaluated an architecture in which restrictions are imposed on

the amount of server resources (such as threads or processes) which are available to basic clients.

In [4] admission control and scheduling algorithms are used to provide premium clients with bet-

ter service. A server architecture is proposed that maintains separate service queues for premium

and basic clients, thus facilitating their differential treatment [8].

While the above differentiation approach usually offers better service to premium clients, it

does not provide any guarantees on the service. Hence, we call this approach the best effort dif-

ferentiation model. In particular, the best effort differentiation model does not provide guaran-

tees on the extent of the difference between premium and basic performance levels. This

difference depends heavily on load conditions and may be difficult to quantify. In a situation

where clients pay to receive better service, any ambiguity regarding the expected performance

3

improvement may cause client concern, and is, therefore, perceived as a disadvantage. Compared

with the best effort differentiation model, the relative and absolute guarantee models provide

stronger service guarantees.

In the absolute guarantee model, a fixed maximum service delay (i.e., a soft deadline) for

each class needs to be enforced. A disadvantage of the absolute guarantee model is that it is usu-

ally difficult to determine appropriate deadlines for Web services. For example, the tolerable de-

lay threshold of a Web user may vary significantly depending on Web page design, length of

session, browsing purpose, and properties of the Web browser [11]. Since system load can grow

arbitrarily high in a Web server, it is impossible to satisfy the absolute delay guarantees of all

service classes under overload conditions. The absolute delay guarantee requires that all classes

receive satisfactory delay if the server is not overloaded; otherwise desired delays are violated in

the predefined priority order, i.e., low priority classes always suffer guarantee violation earlier

than high priority classes. In the absolute guarantee model, deadlines that are too loose may not

provide necessary service differentiation because the deadlines can be satisfied even when delays

of different classes are the same. On the other hand, deadlines that are too tight can cause ex-

tremely long latency for low priority classes in order to enforce high priority classes’ unnecessar-

ily tight deadlines.

In the relative delay guarantee (also called proportional differentiated service [16]), a fixed

ratio between the delays seen by the different service classes can be enforced. This architecture

provides a specification interface and an enforcement mechanism such that a desired "distance"

between the performance levels of different classes can be specified and maintained. This service

model is more precise in its performance differentiation semantics than the best effort differen-

tiation model. The relative delay guarantee is also more flexible than absolute guarantee because

it does not require fixed deadlines being assigned for each service class.

Depending on the nature of the overload condition, either the relative delay guarantee or the

absolute guarantee may become more desirable. The relative delay guarantee may be less appro-

4

priate in severe overload conditions because even high-priority clients may get extremely long

delays. In nominal overload conditions, however, the relative delay guarantee may be more de-

sirable than absolute guarantee because the relative delay guarantee can provide adequate and

precise service differentiation without requiring artificial, fixed deadlines being assigned to each

service class. Therefore, a hybrid guarantee is desirable in some systems. For example, a hybrid

policy can be that the server provides relative delay guarantee when the delay received by each

high-priority class is within its tolerable threshold. When the delay received by a high priority

class exceeds its threshold, the server automatically switches to the absolute guarantee model

that enforces desired delays for high priority classes at the cost of violating desired delays of low

priority classes. This policy can achieve the flexibility of the relative delay guarantee in nominal

overload and bound the delay of high priority class in severe overload conditions.

In this paper, we present a Web server architecture to support delay guarantees including the

absolute guarantee, relative delay guarantee, and the hybrid guarantee described above. A key

challenge in guaranteeing service delays in a Web server is that resource allocation that achieves

the desired delay or delay differentiation depends on load conditions that are unknown a priori.

A main contribution of this paper is the introduction of a feedback control architecture for adapt-

ing resource allocation such that the desired delay differentiation between classes is achieved.

We formulate the adaptive resource allocation problem as one of feedback control and apply

feedback control theory to develop the resource allocation algorithm. We target our architecture

specifically for the HTTP 1.1 protocol [19], the most recent version of HTTP that has been

adopted at present by most Web servers and browsers. Hence, our contributions can be summa-

rized as follows:

• An adaptive architecture for achieving relative, absolute, and hybrid service delay guarantees

in Web servers under HTTP 1.1

• Designing an adaptive connection scheduler with proven performance guarantees using feed-

back control theory and methodology:

5

• Using system identification to model Web servers for purposes of performance control

• Specifying performance requirements of Web servers using control-based metrics

• Designing feedback controllers using the Root Locus method to satisfy the performance

specifications.

• Implementing and evaluating the adaptive architecture on a modified Apache Web server that

achieves robust performance guarantees even when the workload varies considerably

The rest of this paper is organized as follows. In Section II, we define the semantics of delay

differentiation guarantees on Web servers. The design of the adaptive server architecture to the

delay guarantees is described in Section III. In Section IV, we apply feedback control theory to

systematically design a controller to satisfy the desired performance of the Web server. The im-

plementation of the architecture on an Apache Web server and experimental results are presented

in Sections V and VI, respectively. The related work is summarized in Section VII. We then

conclude the paper in Section VIII.

II. SERVICE DELAY GUARANTEES ON WEB SERVERS

In this section, we first discuss various components of delays in Web services, and then for-

mally specify several models of service delay guarantees on Web servers.

A. Delays in Web Services

To analyze the delays in Web services, we now briefly introduce how Web servers and

HTTP protocols operate. Web server software usually adopts a multi-process or a multi-threaded

model. Processes or threads can be either created on demand or maintained in a pre-existing pool

that awaits incoming TCP connection requests to the server. The latter design alternative reduces

service overhead by reducing dynamic process creation and termination - a very costly operation

in an operating system such as UNIX. In this paper, we assume a multi-process model with a

pool of processes, which is the model of the Apache server, the most commonly used Web server

today. Each server process or thread can only handle one TCP connection at any time instant.

6

Since multi-process and multi-thread servers operate in a similar fashion, without loss of general-

ity our following description assumes that the server has multiple processes. Servicing a Web

request involves the following steps:

1. A client establishes a TCP connection with a server process. The TCP connection establish-

ment includes (1) queuing of the connection request (the SYN packet) in the TCP port of the

server and (2) the TCP three-way handshake.

2. The client sends an HTTP request to the process over the TCP connection.

3. The server processes the request and generates a response. This step includes processing the

network protocol stack, parsing the HTTP request, file system operations, and executing dy-

namic Web content (e.g., CGI scripts and database queries).

4. The server process transmits the response back to the client over the TCP connection.

Upon the completion of the response transmission, the server behaves differently depending

on the version of HTTP protocol. If the request is an HTTP 1.0 request, the server process im-

mediately closes the TCP connection. This resulted in a large number of TCP connections that

must be established and maintained by the server. To remedy this problem the current version of

HTTP, called HTTP 1.1, reduces the number of concurrent TCP connections with a mechanism

called persistent connections [19], which allows multiple web requests to reuse the same connec-

tion. Specifically, after a response has been transmitted, the TCP connection is left open in an-

ticipation that the same connection can be reused by a following HTTP request from the same

client; if a new HTTP request arrives within a specific interval, the connection is kept open; oth-

erwise it is closed.

From a client’s perspective, the end-to-end delay of a Web service includes thee following:

• Communication delay on the Internet, including the time for the TCP three-way handshake

and the time for transmitting the HTTP request and the response.

• Connection delay on the server, i.e., the interval between the arrival of a TCP connection re-

quest and when a server process accepts the request.

7

• Processing delay on the server, i.e., the time for generating the response on the server.

To deliver satisfactory end-to-end services to the client, performance guarantees must be

provided on all the above delay components. The problem of providing communication delay

guarantees on the Internet has received significant attention – most notably in the context of the

IntServ [28] and DiffServ [10] models. While network communication is an important compo-

nent in Web services, network-layer solutions alone are not sufficient for achieving satisfactory

Web performance because servers often contribute a significant portion of the end-to-end delay

[8]. For example, a server may introduce considerable processing delay for generating dynamic

content and data encryption. Recent research on server resource management [7][17] and proces-

sor scheduling [21] aims to optimize the processing delay and throughout in server systems (see

Section VII for more discussion on related work).

Complimentary to related research that focused on communication and processing delay, our

work is concerned with controlling the connection delay on Web servers. Significant connection

delay arises when all server processes are tied up with existing connections1. In this case, new

connection requests are queued in the TCP listen queue until they are accepted by a server proc-

ess. Controlling connection delay can be especially important in a server running the HTTP 1.1

protocol because a server process may be tied up with a persistent connection even when it is not

processing any request. One way to alleviate this problem is to increase the number of server

processes. However, too many processes can cause thrashing in virtual memory systems thus de-

grading server performance considerably. In practice, an operating system specific limit is im-

posed on the number of concurrent server processes.

Note that CPU scheduling is not necessarily an effective mechanism for controlling connec-

tion delays because CPU may not be the bottleneck in HTTP 1.1 servers [23]. In this work, we

develop a novel server process allocation mechanism to support connection delay control in

1 Connections delay is insignificant in event-driven Web servers [26], which have no limit on the number of connec-tions that can be accepted besides limits on the maximum number of open descriptors imposed by the OS.

8

HTTP 1.1 servers. Note that it is not the objective of our current research to interfere with HTTP

1.1 semantics to improve CPU utilization.

B. Semantics of Service Delay Guarantees

Suppose every HTTP request belongs to a class k (0 ≤ k < N). The connection delay Ck(m) of

class k at the mth sampling instant is defined as the average connection delay of all established

connections of class k within the ((m-1)S, mS) sec, where S is a constant sampling period. Con-

nection delay guarantees are defined as follows. For simplicity of presentation, we use delay to

refer to connection delay interchangeably in the rest of this paper.

• Relative Delay Guarantee: A desired relative delay Wk is assigned to each class k. A rela-

tive delay guarantee {Wk | 0 ≤ k < N} requires that Cj(m) / Cl(m) = Wj / Wl for ∀ classes j and

l (j ≠ l). For example, if class 0 has a desired relative delay of 1.0, and class 1 has a desired

relative delay of 2.0, it is required that the connection delay of class 0 should be half of that

of class l.

• Absolute Delay Guarantee: A desired (absolute) delay Wk is assigned to each class k. An

absolute delay guarantee {Wk | 0 ≤ k < N} requires that Cj(m) ≤ Wj for ∀ classes j if ∃ a class

l > j and Cl(m) ≤ Wl (a lower class number means a higher priority). Note that since system

load can grow arbitrarily high in a Web server, it is impossible to satisfy the desired delay of

all service classes under overload conditions. The absolute delay guarantee requires that all

classes receive satisfactory delay if the server is not overloaded; otherwise desired delays are

violated in the predefined priority order, i.e., low priority classes always suffer guarantee vio-

lation earlier than high priority classes.

III. A FEEDBACK CONTROL ARCHITECTURE FOR WEB SERVER QOS

In this section, we present an adaptive Web server architecture (as illustrated in Figure 1) to

provide the above delay guarantees. A key feature of this architecture is the use of feedback con-

trol loops to enforce desired relative/absolute delays via dynamic reallocation of server process.

9

The architecture is composed of a Connection scheduler, a Monitor, a Controller, and a fixed

pool of server processes. We describe the design of the components in the following subsections.

TCP listen queue

TCP connection requests Connection

Scheduler

HTTP response

Web ServerWeb

ServerServerProcess

monitorControllers

{Wk | 0 ≤ k < N}

{Ck | 0 ≤ k < N}

{Bk | 0 ≤ k < N}

HTTP service requests

Figure 1. The Feedback-Control Architecture for Delay Guarantees

A. Connection Scheduler

The Connection Scheduler serves as an actuator to control the delays of different classes. It

listens to the well-known port and accepts every incoming TCP connection request. The Connec-

tion Scheduler uses an adaptive proportional share policy to allocate server processes to connec-

tions from different classes. At every sampling instant m, every class k (0 ≤ k < N) is assigned a

process budget, Bk(m), i.e., class k should be allocated at most Bk(m) server processes in the mth

sampling period. For a system with absolute delay guarantees, the total budgets of all classes can

exceed the total number of server processes in overload, which is a condition called control satu-

ration. In this case, the process budgets are satisfied in the priority order until every process has

been allocated to a class. This policy means that the process budgets of high priority classes are

always satisfied before those of low priority classes, and thus the correct order of guarantee vio-

lations can be achieved. For a server with relative delay guarantee, our Relative Delay Control-

lers always guarantee that the total budget equals the total number of processes.

10

For each class k, the Connection Scheduler maintains a (FIFO) connection queue Qk and a

process counter Rk. The connection queue Qk holds connections of class k before they are allo-

cated server processes. The counter Rk is the number of processes allocated to class k. After an

incoming connection is accepted, the Connection Scheduler classifies the new connection and

inserts the connection descriptor to the scheduling queue corresponding to its class. Whenever a

server process becomes available, a connection at the front of a scheduling queue Qk is dis-

patched if class k has the highest priority among all eligible classes {j | Rj < Bj(m)}.

For the above scheduling algorithm, a key issue is how to decide the process budgets {Bk | 0

≤ k < N} to achieve the desired relative or absolute delays {Wk | 0 ≤ k < N}. Note that static map-

pings from the desired relative or absolute delays to the process budgets (e.g., based on system

profiling) cannot work well when the workloads are unpredictable and vary at run time. This

problem motivates the use of feedback controllers to dynamically adjust the process budgets to

maintain desired delays.

Because the Controller can dynamically change the process budgets, a situation can occur

when a class k’s new process budget Bk(m) (after the adjustment in saturation conditions de-

scribed above) exceeds the total number of free server processes and processes already allocated

to class k. Such class k is called an under-budget class. Two different policies, preemptive vs.

non-preemptive scheduling, can be supported in this case. In the preemptive scheduling model,

the Connection Scheduler immediately forces server processes to close connections of over-

budget classes whose new process budgets are less than the number of processes currently allo-

cated to them. In the non-preemptive scheduling model, the Connection Scheduler waits for

server processes to voluntarily release connections of over-budget classes before it allocates

enough processes to under-budget classes. The advantage of the preemptive model is that it is

more responsive to the Controller’s input and load variations, but it can cause jittery delay in

preempted classes because they may have to re-establish connections with the server in the mid-

dle of loading a Web page. On the other hand, non-preemptive scheduling may be particularly

11

desirable in commercial Web servers where it is important to keep established connections alive

in order to conclude the commercial transactions between the customers and the seller [13].

Only the non-preemptive model is currently implemented in our Web server.

B. Server Processes

The second component of the architecture (Figure 1) is a fixed pool of server processes.

Every server process reads connection descriptors from the connection scheduler. Once a server

process closes a TCP connection it notifies the connection scheduler and becomes available to

process new connections.

C. Monitor

The Monitor is invoked at each sampling instant m. It computes the average connection de-

lays {Ck(m) | 0 ≤ k < N} of all classes during the last sampling period. The sampled connection

delays are used by the Controller to compute new process proportions.

D. Controllers

The architecture uses one Controller for each relative or absolute delay constraint. At each

sampling instant m, the Controllers compare the sampled connection delays {Ck(m) | 0 ≤ k < N}

with the desired relative or absolute delays {Wk | 0 ≤ k < N}, and computes new process budgets

{Bk(m) | 0 ≤ k < N}, which are used by the Connection Scheduler to reallocate server processes

during the following sampling period.

Absolute Delay Controllers

The absolute delay of every class k is controlled by a separate Absolute Delay Controller

CAk. The key parameters and variables of CAk, are shown in blow.

Reference VSk Desired delay of class k, VSk = Wk.

Output Vk(m) Sampled delay of class k, Vk(m) = Ck(m).

Error Ek(m) Difference between the reference and the output, Ek(m) = VSk – Vk(m).

Control input Uk(m) Process budget Bk-1(m) of class k. Table 1: Variables and Parameters of the Absolute Delay Controller CAk

12

The goal of the Absolute Delay Controller CAk is to reduce the error Ek(m) to 0 and achieve

the desired delay for class k. Intuitively, when Ek(m) = VSk – Vk(m) < 0, the Controller should

increase the process budget Uk(m) = Bk(m) to allocate more processes to class k. At every sam-

pling instant m, the Absolute Delay Controller calls PI (Proportional-Integral) control [18] to

compute the control input. A digital form of the PI control function is

Uk(m) = Uk(m-1) + g(Ek(m) – rEk(m-1)) (1)

g and r are design parameters called the controller gain and the controller zero, respectively. The

performance of the Web server depends on the values of the controller parameters. An ad hoc

approach to design the controller is to test different values of the parameters with extensive ex-

periments. In contrast, we apply control theory to tune the parameters analytically to guarantee

the desired performance in the Web server. The design and tuning methodology is presented in

Section IV.

For a system with N service classes, the Absolute Delay Guarantee is enforced by N Absolute

Delay Controllers CAk (0 ≤ k < N). At each sampling instant m, each Controller CAk computes

the process budget of class k. Note that in overload conditions, the process budgets (especially

those of low priority classes) computed by the Absolute Delay Controllers may not be feasible if

the sum of the computed process budgets of all classes exceeds the total number of server proc-

esses M, i.e., ∑jBk(m) > M. This is a situation called control saturation. Because low priority

classes should suffer guarantee violation in overload conditions, the system always satisfy the

computed process budgets in the decreasing order of priorities until every server process has

been allocated to a class2.

Relative Delay Controllers

The relative delay of every two adjacent classes k and k-1 is controlled by a separate Relative

Delay Controller CRk. Each Relative Delay Controller CRk, has following key parameters and

13

variables. For simplicity of discussion, we use the same notations for the corresponding parame-

ters and variables of the Absolute Delay Controller and the Relative Delay Controllers.

Reference VSk Desired delay ratio between class k and k-1, VSk = Wk / Wk-1.

Output Vk(m) Sampled delay ratio between class k and k-1, Vk(m) = Ck(m) / Ck-1(m).

Error Ek(m) Difference between the reference and the output, Ek(m) = VSk – Vk(m).

Control input Uk(m) The ratio (called the process ratio) between the number of processes to be al-located to class k-1 and k, Uk(m) = Bk-1(m) / Bk(m).

Table 2: Variables and Parameters of the Relative Delay Controller CRk

Intuitively, when Ek(m) < 0, CRk should decrease the process ratio Uk(m) to allocate more

processes to class k relative to class k-1. The goal of the controller CRk is to reduce the error

Ek(m) to 0 and achieve the correct delay ratio between class k and k-1. Similar to the Absolute

Delay Controller, the Relative Delay Controller also uses PI (Proportional-Integral) control (1) to

compute the control input (note that the parameters and variables are interpreted differently in

the Absolute Delay Controller and the Relative Delay Controller).

For a system with N service classes and M server processes, the Absolute Delay Guaran-

tee is enforced by N-1 relative Delay Controllers CRk (1 ≤ k < N). At every sampling instant m,

the system calculates the process budget Bk(m) of each class k as follows.

control_relative_delay ({Wk | 0 ≤ k < N}, {Ck(m) | 0 ≤ k < N}) { Set class (N-1)’s process proportion PN-1(m) = 1; S = PN-1(m); for ( k = N-2; k ≥ 0; k--) { Calls CRk+1 to get the process ratio Uk+1(m) between classes k and k+1; The process proportion of class k Pk(m) = Pk+1(m)Uk(m); S = S + Pk(m); } for ( k = N-1; k ≥ 0; k--) Bk(m) = MPk(m) / S; }

2 To avoid complete starvation of low priority classes, the system may reserve a minimum number of server proc-esses to each service class.

14

We note that our architecture can be easily extended to support hybrid delay guarantees. For

example, the hybrid delay guarantee described in Section II can be implemented via dynamic

switching between the Absolute and Relative Delay Controllers based on the delay of high-

priority classes. We will focus on absolute and relative delay guarantees in the rest of this paper.

In summary, we have presented a feedback control architecture to achieve absolute, relative

and hybrid delay guarantees on Web servers. A key component in this architecture is the Con-

trollers, which are responsible for dynamically computing correct process budgets in face of un-

predictable workload and system variations. In the rest of the paper, we use the close-loop server

to refer to the adaptive Web server with the Controllers (Figure 1), while the open-loop server

refers to a non-adaptive Web server without the Controllers. We present the design and tuning of

the Controllers in the next section.

IV. DESIGN OF THE CONTROLLER

In this Section, we apply a control theory framework [18][23] to design the Relative Delay Controller

CRk and the Absolute Delay Controller CAk. In Section A, we specify the performance requirement of the

Controllers. We then use system identification techniques to establish dynamic models for the Web server

in Section B. Based on the dynamic model, we use the Root Locus method to design the Controllers that

meet the performance specifications.

A. Performance Specifications

We adopt a set of metrics from control theory [20] to characterize the dynamic performance

of an adaptive server. The performance specifications include in following.

• Stability: a (BIBO) stable system should have bounded output in response to bounded input.

To the Relative Delay Controller (with a finite desired delay ratio), stability requires that the

delay ratio should always be bounded at run-time. To the Absolute Delay Controller, stability

requires that the service delay should always be bounded at run-time. Stability is a necessary

condition for achieving desired relative or absolute delays.

15

• Settling time Ts is the time it takes the output to converge to the vicinity of the reference and

enter steady state. The settling time represents the efficiency of the Controller, i.e., how fast

the server can converge to the desired relative or absolute delay. As an example, we assume

that our Web server requires the settling time Ts < 5 min.

• Steady state error Es is the difference between the reference input and average of output in

steady state. The steady state error represents the accuracy of the Relative Delay Controller

or Absolute Delay Controller in achieving the desired relative or absolute delays. As an ex-

ample, we assume that our Web server requires a steady state error |Es| < 0.1VS. Note that sat-

isfying this steady state error requirement means that our Web server can achieve the desired

relative or absolute delays in steady state.

B. System Identification: Establishing Dynamic Models

A dynamic model describes the mathematical relationship between the control input and the

output of a system (usually with differential or difference equations). Modeling provides a foun-

dation for the analytical design of the Controller. As shown in Tables 1 and 2, an Absolute Delay

Controller and a Relative Delay Controller have different variables as their control inputs and

system outputs. However, the design methodology described below applies to both cases. As-

suming the controlled system models for different classes are similar, we skip the class number k

of Uk(m) and Vk(m) in the rest of this Section.

Unlike traditional control theory applications such as mechanical and electronic systems, it is

often difficult to directly describe a computing system such as a Web server with differential or

difference equations. To solve the modeling problem, we adopt a practical approach by applying

system identification [6] to estimate the model of the Web server. The controlled system (includ-

ing the Connection scheduler, the server processes, and the Monitor) is modeled as a difference

equation with unknown parameters. We then stimulate the Web server with pseudo-random digi-

tal white-noise input and use a least squares estimator [6] to estimate the model parameters. Our

experimental results (Section VI) established that, for both relative and absolute delay control,

16

the controlled system can be modeled as a second order difference equation with adequate accu-

racy for the purpose of control design. The architecture used for system identification is illus-

trated in Figure 2. We describe the components of the architecture in the following subsections.

Web ServerWeb

ServerTCP listen queue

TCP connectionrequests Connection

Scheduler

HTTP response

ServerProcess

Least squaresestimator

white-noisegenerator

{ C0, C1 }

{ B0, B1 }

Modelparameters

monitor

HTTP service requests

Figure 2 Architecture for System Identification

Model Structure

The Web server is modeled as a difference equation with unknown parameters, i.e., an nth or-

der model can be described as follows,

∑∑==

−+−=n

jj

n

jj jmUbjmVamV

11)2()()()(

In an nth order model, there are 2n parameters {aj, bj | 1 ≤ j ≤ n} that need to be decided by

the least-squares estimator. The difference equation model is motivated by our observation that

the output of an open-loop server depends on previous inputs and outputs. Intuitively, the dy-

namics of a Web server is due to the queuing of connections and the non-preemptive scheduling

mechanism. The connection delay may depend on the number of server processes allocated to its

class in several previous sampling periods. Furthermore, after class k’s process budget is in-

creased, the non-preemptive Connection Scheduler may have to wait for server processes of

other classes to close their connections in order to reclaim enough processes to class k.

17

White Noise Input

To stimulate the dynamics of the open-loop server, we use a pseudo-random digital white

noise generator to randomly switch two classes’ process budgets between two configurations.

White noise input has been commonly used for system identification [6]. The white noise algo-

rithm is not presented due to space limitations. A standard algorithm for white noise can be

found in [6].

Least Squares Estimator

The least squares estimator is the key component of the system identification architecture. In

this section, we review its mathematical formulation and describe its use to estimate the model

parameters. The derivation of estimator equations is given in [6]. The estimator is invoked peri-

odically for at every sampling instant. At the mth sampling instant, it takes as input the current

output V(m), n previous outputs V(m-j) (1 ≤ j ≤ n), and n previous inputs U(m-j) (1 ≤ j ≤ n). The

measured output V(m) is fit to the model described in (2). Define the vector q(m) = (V(m-1) …

V(m-n) U(m-1) …U(m-n))T, and the vector θ(m) = (a1(m)…an(m) b1(m)… bn(m))T, i.e., the esti-

mations of the model parameters in (2). These estimates are initialized to 1 at the start of the es-

timation. Let R(m) be a square matrix whose initial value is set to a diagonal matrix with the

diagonal elements set to 10. The estimator’s equations at sampling instant m are [6]:

)5())1()()()()(1()()4())1()()()(()()1()1()()3()1)()1()(()( 1

−−−=−−−+−=

+−= −

mRmqmmqImRmRmmqmVmmqmRmm

mqmRmqm

T

T

T

γθγθθ

γ

At any sampling instant, the estimator can “predict” a value V*(m) of the output by substitut-

ing the current estimates θ(m) into (2). The difference V(m)-V*(m) between the measured output

and the prediction is the estimation error. It was proved that the least squares estimator iteratively

updates the parameter estimates such that ∑0≤i≤m(V(i) – V*(i))2 is minimized.

Our system identification results established that, the controlled system can be modeled as a

second order difference equation,

18

V(m) = a1V(m-1) + a2V(m-2) + b1U(m-1) + b2U(m-2) (6a)

As shown in Section VI, the estimated model parameters from our experiments on an Apache

Web server are as follows:

Relative Delay Control: (a1, a2, b1, b2) = (0.74, -0.37, 0.95, -0.12) (6b)

Absolute Delay Control: (a1, a2, b1, b2) = (-0.08, -0.2, -0.2, -0.05) (6c)

C. Root-Locus Design

Given a model described by (6a), we can apply control theory methods such as the Root Lo-

cus [20] to design the Relative Delay Controller and the Absolute Delay Controller. The con-

trolled system model in (6a) can be converted to a transfer function G(z) in z-domain (7). The

transfer function of the PI controller (1) in the z-domain is (8). Given the controlled system

model and the Controller model, the transfer function of the close-loop system is (9).

)9()()(1

)()()(

)8(1

)()(

)7()()()(

212

21

zGzDzGzDzG

zrzgzD

azazbzb

zUzVzG

c +=

−−=

−−+==

According to control theory, the performance of a system depends on the poles of its transfer

function. The Root Locus is a graphical technique that plots the traces of poles of a close-loop

system on the z-plane (or s-plane) as its controller parameters change. We use the Root Locus

tool of MATLAB to tune the controller gain g and the controller zero r so that the performance

specs can be satisfied. Due to space limitations, we only summarize results of the design in this

paper. The details of the design process can be found in control textbooks such as [20].

To design the Relative Delay Controller, we use the Root Locus tool to place the close-loop

poles for relative delay control at (0.70, 0.38±0.62i) by setting the Relative Delay Controller’s

parameters to g = 0.3 and r = 0.05. Similarly, the close-loop poles for absolute delay control are

placed at (0.607, -0.30±0.59i) by setting the Absolute Delay Controller’s controller parameters to

19

g = -4.6, r = 0.3. Control analysis shows that the close-loop server with above pole placement

have the following properties.

Stability: The close-loop systems with the Relative and the Absolute Delay Controllers guaran-

tee stability because all the close-loop poles are in the unit circle, i.e., |pj| < 1 (0 ≤ j ≤ 2).

Settling time: According to control theory, decreasing the radius (i.e., the distance to the origin

in the z-plane) of the close-loop poles usually results in shorter settling time. The Relative Delay

Controller achieves a settling time of 270 sec, and the Absolute Delay Controller achieves a set-

tling time of 210 sec, both lower than the required settling time (300 sec) defined in Section A.

Steady state error: Both the Relative Delay Controller and the Absolute Delay Controller

achieve zero steady state error, i.e., Es = 0. This result can be easily proved using the Final Value

Theorem in digital control theory [18]. This result means that, in steady state, the close-loop sys-

tem with the Relative Delay Controller or the Absolute Delay Controller guarantees the desired

relative delays or the desired absolute delays, respectively.

In summary, using feedback control theory techniques, we systematically design the Relative

Delay Controller and the Absolute Delay Controller that analytically provide the desired relative

or absolute delay guarantee and meet the transient and steady state performance specifications.

This result shows the efficacy of the control-theoretic design framework for software systems.

V. IMPLEMENTATION

We now describe the implementation of the Web server. We modified the source code of

Apache 1.3.9 [5] and added a new library that implemented a Connection Manager (including the

Connection Scheduler, the Monitor and the Controllers). The server was written in C and tested

on a Linux platform. The server is composed of a Connection Manager process and a fixed pool

of server processes (modified from Apache). The Connection Manager process communicates

with each server process with a separate UNIX domain socket.

The Connection Manager runs a loop that listens to the Web server’s TCP socket and accepts

incoming connection requests. Each connection request is classified based on its sender’s IP ad-

20

dress3 and scheduled by a Connection Scheduler function. The Connection Scheduler dispatches

a connection by sending its descriptor to a free server process through the corresponding UNIX

domain socket. The Connection Manager time stamps the acceptance and dispatching of each

connection. The difference between the acceptance and the dispatching time is recorded as the

connection delay of the connection. Strictly speaking, the connection delay should also include

the queuing time in the TCP listen queue in the kernel. However, the kernel delay is negligible in

this case because the Connection Manager always greedily accepts (dequeues) all incoming TCP

connection requests in a tight loop. The Monitor and the Controllers are invoked periodically at

every sampling instance. For each invocation, the Monitor computes the average delay for each

class. This information is passed to the Controllers, which then computes new process budgets.

We modified the code of the server processes so that they accept connection descriptors from

UNIX domain sockets (instead of common TCP listen socket as in Apache). When a server proc-

ess closes a connection, it notifies the Connection Manager of its new status through the UNIX

domain socket.

The server can be configured to a close-loop/open-loop server by turning on/off the Control-

lers. An open-loop server can be configured for either system identification or performance

evaluation as a baseline.

VI. EXPERIMENTATION

All experiments were conducted on a testbed of five PCs connected with 100 Mbps

Ethernet4. Each machine had a 450MHz AMD K6-2 processor and 256 MB RAM. One PC was

used to run the Web server with HTTP 1.1, and up to four other PCs were used to simulate cli-

ents that stress the server with a synthetic workload. The experimental setup was as follows.

3 Other criteria for connection classification include HTTP cookies, Browser plug-ins, URL request type or filename path, and destination IP address of virtual servers [8]. 4 Transmission delays on the Internet are not simulated because the focus of this work is on controlling the connection delays on the server.

21

• Client: We used SURGE [9] to generate realistic Web workloads in our experiments.

SURGE uses a number of user equivalents (also called users for simplicity) to emulate the

behavior of real-world clients. The load on the server can be adjusted by changing the num-

ber of users on the client machines. Up to 500 concurrent users were used in our experiments.

• Server: The total number of server processes was configured to 128. Since service differen-

tiation is most necessary when the server is overloaded, we set up the experiment such that

the ratio between the number of users and the number of server processes could drive the

server to overload. Note that although large Web servers such as on-line trading servers usu-

ally have more server processes, they also tend to have many more users than the workload

we generated. Therefore, our configuration can be viewed an emulation of real-world over-

load scenarios at a smaller scale. The sampling period S was set to 30 sec in all the experi-

ments. The connection TIMEOUT of HTTP 1.1 was set to 15 sec (the default value in

Apache 1.3.9).

A. System Identification

We now present the results of system identification experiments for both relative delay and

absolute delay to establish a dynamic model for the open-loop system. Four client machines are

divided into two classes 0 and 1, and each class has 200 users.

Relative Delay Control: We begin with the relative delay experiments. The input, process ratio

U(m) = B0(m) / B1(m), is initialized to 1. At each sampling instant, the white noise randomly sets

the process ratio to 3 or 1. The sampled output, the relative delay V(m) = C1(m) / C0(m) is fed to

the least squares estimator to estimate model parameters (2). Figure 3(a) shows that the estimated

parameters of a second order model (6) at successive sampling instants in a 30 min run. The es-

timator and the white noise generator are turned on 2 min after SURGE started in order to avoid

its start-up phase. We can see that the estimations of (a1, a2, b1, b2) converge to (0.74, -0.37,

0.95, -0.12). Substituting the estimations into (6), we established an estimated second-order

model for the open-loop server. To verify the accuracy of the model, we re-run the experiment

22

with a different white noise input (i.e., with a different random seed) to the open-loop server and

compare the actual delay ratio and that predicted by the estimated model. The result is illustrated

in Figure 3(b). We can see that prediction of the estimated model is consistent with the actual

relative delay throughout the 30 min run. This result shows that the estimated second order

model is adequate for designing the Relative Delay Controller. We also re-ran the system identi-

fication experiments to estimate a first order model and a third order model. The results demon-

strate that the estimated first order model had larger prediction error than the second order model

(see Figure 3(c)), while an estimated third order model does not tangibly improve the modeling

accuracy (see Figure 3(d)). Hence the second order model is chosen as the best compromise be-

tween accuracy and complexity.

Absolute Delay Control: The experiments are repeated with the same workload and configura-

tions for absolute delay control. The input of the open loop system is the process budget U(m) =

B0(m) of class 0, which is initialized to 64. At each sampling instant, the white noise randomly

sets the process budget to 96 or 64. The output is the sampled delay V(m) = C0(m) of class 0. To

linearize the model, we feed the difference between two consecutive inputs (B0(m) - B0(m-1))

and the difference between two consecutive outputs (C0(m) - C0(m-1)) to the least squares esti-

mator to estimate the model parameters in (2). Figure 4(a) shows that the estimated parameters of

the second order model (6) at successive sampling instants in a 30 min run. The estimations of

the parameters (a1, a2, b1, b2) converge to (-0.08, -0.20, -0.20, -0.05). To verify the accuracy of

the model, we re-run the experiment with a different white noise input to the open-loop server

and compare the actual difference between two consecutive delay samples with that predicted by

the estimated model (Figure 4(b)). Similar to the relative delay case, the prediction of the esti-

mated model is consistent with the actual delay throughout the 30 min run. This result shows that

the estimated second order model is adequate for designing the Absolute Delay Controller.

23

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-1

0

1

Est

imat

ed p

aram

eter

s

(a) Estimated model parameters (second order model)

a1a2b1b2

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(b) Modeling error (second-order model)

actualestimate

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(c) Modeling error (first-order model)

actualestimate

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(d) Modeling error (third-order model)

actualestimate

Figure 3 System Identification for Relative Delay Control

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-0.5

0.0

0.5

Est

imat

ed m

odel

par

amet

ers

(a) Estimated model parameters (second-order model)

a1a2b1b2

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-20

-10

0

10

20

C0(

m)-

C0(

m-1

) (s

econ

d)

Modeling error (second-order model)

actualestimate

Figure 4: System Identification for Absolute Delay Control

24

B. Performance Evaluation

We perform three sets of experiments to evaluate the performance of our Web server under

different configurations. In the first two sets of experiments, the server is configured to control

the relative delays of two and three classes, respectively. In the final set of experiments, the

server is configured to control the absolute delays of two service classes.

Relative Delay Control for Two Classes

To evaluate the relative delay control for two classes, we set up the experiments as follows.

• Workload: Four client machines are evenly divided into two classes. Each client machine

simulates 100 users. In the first half of each run, only one client machine from class 0 and

two client machines from class 1 generate HTTP requests to the server. The second machine

for class 0 starts generating HTTP requests 870 sec later than the other three machines.

Therefore, the user population of class 0 changes from 100 to 200 in the middle of each run,

while the user population of class 1 remains 100 throughout each run.

• Close-loop server: The reference input to the Controller is W1 / W0 = 3. The process ratio

B0(m) / B1(m) is initialized to 1 in the beginning of the experiments. Note that the close-loop

server does not require hand-tuning of the process ratio. To avoid the starting phase of

SURGE, the Controller is turned on 150 sec after SURGE started.

• Open-loop server: An open-loop server is also tested as a baseline. The process allocation in

the open-loop server is hand-tuned through profiling experiments running the original work-

load (100 users from class 0 and 200 users from class 1).

Figures 7(a)(b) show the performance of the close-loop server in a typical run. When the

Controller is turned on at 150 sec, the delay ratio C1(m) / C0(m) = 28.5 / 6.5 = 4.4 due to the in-

correct initial process allocation. The feedback control loop automatically reallocates processes

so that the delay ratio converges to the vicinity of the reference W1 / W0 = 3. After the number of

users of class 0 suddenly increases from 100 to 200 at 870 sec, the delay ratio transiently drops

to 1.2 (at 900 sec). The feedback control reacts to the load variation by allocating more processes

25

to class 0 while de-allocating processes from class 1. The delay ratio re-converges to the vicinity

of the set point within the designed settling time of 270 sec. As predicted by our control analysis,

the server remains stable throughout the run. Although the delay ratio oscillates slightly around

the set point in steady states due to the noise in workloads, the average delay ratio is close to the

reference in steady states.

The results on the close-loop server demonstrate that the close-loop server can (1) self-tune

its process allocation to achieve desired delay ratios, and (2) maintain robust relative delay guar-

antees despite significant variation in user population. These capabilities are highly desirable in

Web servers that often face bursty workload [14]. Furthermore, the experimental results also

validate our control design and analysis.

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (s

econ

d)

(a) Close-loop: Connection Delays C(0) and C(1)

Class 0Class 1

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

1

2

3

4

5

Del

ay R

atio

C1(

m)/C

0(m

)

(b) Close-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))

referenceDelay RatioProcess Ratio

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (s

econ

d)

(c) Open-loop: Connection Delays C(0) and C(1)

Class 0Class 1

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

1

2

3

4

5

Del

ay R

atio

C1(

m)/C

0(m

)

(d) Open-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))

referenceDelay RatioProcess Ratio

Figure 5: Experimental Results of Relative Delay Control for Two Classes

26

In contrast, as shown in Figures 7(c)(d), while the hand-tuned open-loop server achieves sat-

isfactory relative delays when the workload conforms to its expectation (from 150 sec to 900

sec), it severely violates the relative delay guarantee after the workload changes. From 960 sec to

the end of the run, class 0 receives longer delays than class 1. Therefore, an open-loop server

cannot maintain desired relative delay guarantees in face of varying workload.

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (

seco

nd)

Class 0Class 1Class 2

Figure 6: Experimental Results of Relative Delay Control for Three Classes

Relative Delays Control for Three Classes

The second experiment evaluates the performance of relative delay control for three classes.

Each class has 100 users that are simulated by a separate client machine. The desired delay ratios

are W0 : W1 : W2 = 1 : 2 : 4. The process ratios are initialized to B0 : B1 : B2 = 1 : 1 : 1. As shown

in Figure 6, the delay ratios C0 : C1 : C2 converge from 14.6 : 17.3 : 17.5 to 9.3 : 16.2 : 33.9 = 1.0

: 1.7 : 3.6 within the designed settling time (240 sec) after the Controller is turned on at 150 sec,

and remain close to the desired delay ratios in the rest of the run. The server remains stable

throughout the experiment. This experiment demonstrates that our close-loop server can effec-

tively control the relative delays of three classes by combining the inputs from two Controllers.

Absolute Delay Control for Two Classes

The final set of experiments evaluates the performance of absolute delay control for two

classes. The workload is the same as the one used in the first set of experiments (on relative de-

lay control for two classes). The desired delays for classes 0 and 1 are (W0, W1) = (10, 30) sec.

The process budget for each class is initialized to 64 in the close-loop server. The process budg-

27

ets in the open-loop server are hand-tuned to achieve the desired absolute delays for the original

workload (comprised of 100 users from class 0 and 200 users from class 1) through profiling.

0 500 1000 1500

Time (second)

1030

0

50

100

150

200

250L

aten

cy (

seco

nd)

(a) Connection Delays of the Closed Loop Server

Class 0Class 1

0 500 1000 1500

Time (second)

0

50

100

150

200

250

Lat

ency

(se

cond

)

(b) Connection Delays of the Open Loop Server

Class 0Class 1

Figure 7: Experimental Results of Absolute Delay Control for Two Classes

The performance of the close-loop server in a typical run is shown in Figure 7(a). In the first

half of the run, the delays of both classes remain close to their desired delays. However, after the

number of users from class 0 increases from 100 to 200 at 870 sec, the delay of class 0 jumps

from 8.4 sec to 20.0 sec at 900 sec. The Controller for class 0 reacts to the violation of its delay

guarantee by increasing its process budget. Since all the server processes have been allocated and

class 0 has a higher priority, the server fulfills its process budget by depriving processes from

class 1. As expected based on our analysis, the delay of class 0 settles down to the vicinity of its

set point (10 sec) within the designed settling time (210 sec), and remain close to the set point

throughout the rest of run. Meanwhile, class 1 suffers long delay as the server allocates most

processes to class 0. Note that even though the Controller for class 1 increases its process budget

in response to the excessive delay, its input is not fulfilled due to the lack of available processes.

28

The performance of the open-loop server in a typical run is shown in Figure 7(b). The open-

loop server achieves desired delays for both classes when the workload is the same as expecta-

tion (from 150 sec to 900 sec). However, after the user population of class 0 doubles at 870 sec,

the delay of class 0 increases significantly and violates the desired delay. Actually, class 0 suf-

fers a longer delay than class 1 after the workload variation. Note that while both the open loop

server and the close-loop server violate the delay guarantee of one of the service classes, the

close-loop server conforms to the requirement of an absolute delay guarantee, i.e., the desired

delay of a low-priority class should be violated earlier than that of a high-priority class.

VII. RELATED WORK

Support for different classes of service on the Web (with special emphasis on server delay

differentiation) has been investigated in recent literature. For example, Eggert and Heidemann

proposed and evaluated an architecture in which restrictions are imposed on the amount of server

resources (such as threads or processes) which are available to basic clients [18]. The authors of

[4][13] presented admission control and scheduling algorithms to provide premium clients with

better service. Bhatti and Friedrich developed a Web server architecture that maintains separate

service queues for premium and basic clients, thus facilitating their differential treatment [8].

While the above approaches usually offer better service to premium clients, it does not provide

any guarantees on the service and hence can be called the best effort differentiation model.

Our work on connection delay differentiation is complimentary to earlier research that fo-

cuses on providing relative delay guarantees on processing delays on end systems [21] and net-

work delays on the Internet [16]. Several other projects such as [7][17] developed kernel level

mechanism to achieve overload protection and resource isolation in server systems. None of

those projects provided connection delay differentiation in Web servers. The integration of our

work with those techniques will enable Web services to provide end-to-end delay guarantees.

The control-theoretic approach has been adopted in a number of software system projects

(e.g., [3][12][22][24]). A detailed survey on feedback performance control in software services is

29

presented in [2]. Recent work on applying control-theoretic techniques in Web servers are di-

rectly related to this work. The authors of [1] designed a feedback control loop to enforce desired

CPU utilization through content adaptation on a Web server. In parallel with this work, Diao et

al. [15] used system identification and multi-input-multi-output control to achieve desired CPU

and memory utilization on a Web server. Neither of those projects handled connection delays or

relative guarantees in Web servers. To our knowledge, this work presents the first control-

theoretic solution for both absolute and relative guarantees on connection delays in a Web server.

VIII. CONCLUSION AND FUTURE WORK

In this paper, we present the design and implementation of an adaptive architecture to pro-

vide relative, absolute and hybrid connection delay guarantees for different service classes on

Web servers. The first contribution of this paper is a feedback control architecture that enforces

desired connection delay guarantees via dynamic connection scheduling and process realloca-

tion. The second contribution is our use of control theory to design the feedback loop with

proven performance guarantees. In contrast with heuristics-based approaches that often rely on

hand-tuning and testing iterations, our control theory approach enables us to systematically de-

sign an adaptive Web server with established analytical methods including system identification

and Root Locus design. The adaptive architecture has been implemented by modifying an

Apache Web server. Experimental results demonstrate that our adaptive server provides robust

delay guarantees even when workload varies significantly. Properties of our adaptive Web server

also include guaranteed stability, and efficient convergence to desired performance set points. In

the future, we will extend our architecture to provide QoS guarantees in Web server farms.

IX. REFERENCE

[1] T. F. Abdelzaher and N. Bhatti, “Web Server QoS Management by Adaptive Content Delivery,” International Workshop on Quality of Service, 1999.

[2] T. F. Abdelzaher, J. A. Stankovic, C. Lu, R. Zhang, and Y. Lu, "Feedback Performance Control in Software Services," IEEE Control Systems, 23(3), June 2003.

30

[3] L. Abeni, L. Palopoli, G. Lipari, and J. Walpole, "Analysis of a Reservation-based Feedback Scheduler," RTSS, 2002.

[4] J. Almedia, M. Dabu, A. Manikntty, and P. Cao, “Providing Differentiated Levels of Service in Web Content Hosting,” First Workshop on Internet Server Performance, Madison, WI, June, 1998.

[5] Apache Software Foundation, http://www.apache.org. [6] K. J. Astrom and B. Wittenmark, Adaptive control (2nd Ed.), Addison-Wesley, 1995. [7] G. Banga, P. Druschel, and J. C. Mogul, “Resource Containers: A New Facility for Resource Management in

Server Systems,” Operating Systems Design and Implementation (OSDI'96), 1999. [8] N. Bhatti and R. Friedrich, “Web Server Support for Tiered Services.” IEEE Network, 13(5), Sept.-Oct. 1999. [9] P. Barford and M. E. Crovella, “Generating Representative Web Workloads for Network and Server Perform-

ance Evaluation,” ACM SIGMETRICS '98, Madison WI, 1998. [10] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, "An Architecture for Differentiated Ser-

vices," IETF RFC 2475, 1998. [11] A. Bouch, N. Bhatti, and A. J. Kuchinsky, “Quality is in the Eye of the Beholder: Meeting Users' Require-

ments for Internet Quality of Service,” ACM CHI'2000. Hague, Netherland, April 2000. [12] A. Cervin, J. Eker, B. Bernhardsson, and K.-E. Årzén, "Feedback-Feedforward Scheduling of LQG-Control

Tasks," Real-time Systems Journal, 23, 2002. [13] L. Cherkasova and P. Phaal, "Peak Load Management for Commercial Web Servers Using Adaptive Session-

Based Admission Control", Proc. 34th Hawaii International Conference on System Sciences, January 2001 [14] M. E. Crovella and A. Bestavros, “Self-Similarity in World Wide Web Traffic: Evidence and Possible

Causes,” IEEE/ACM Transactions on Networking, 5(6):835--846, December 1997. [15] Y. Diao, N. Gandhi, J. L. Hellerstein, S. Parekh, and D. M. Tilbury, "MIMO Control of an Apache Web

Server: Modeling and Controller Design," American Control Conference, 2002 [16] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “Proportional Differentiated Services: Delay Differentiation

and Packet Scheduling,” SIGCOMM’99, Cambridge, Massachusetts, August 1999. [17] P. Druschel and G. Banga, “Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server

Systems,” Operating Systems Design and Implementation (OSDI'96), Seattle, WA, October 1996. [18] L. Eggert and J. Heidemann, “Application-Level Differentiated Services for Web Servers,” World Wide Web

Journal, Vol 2, No 3, March 1999, pp. 133-142. [19] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, “Hypertext Transfer Pro-

tocol -- HTTP/1.1”, IETF RFC 2616, June 1999. [20] G. F. Franklin, J. D. Powell and A. Emami-Naeini, Feedback Control of Dynamic Systems (3rd Ed.), 1994. [21] K. Jeffay, F.D. Smith, A. Moorthy, and J.H. Anderson, “Proportional Share Scheduling of Operating System

Services for Real-Time Applications,” IEEE Real-Time Systems Symposium, Madrid, Spain, December 1998. [22] B. Li and K. Nahrstedt, “A Control-based Middleware Framework for Quality of Service Adaptations,” IEEE

Journal of Selected Areas in Communication, Special Issue on Service Enabling Platforms, 17(9), Sept. 1999. [23] C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son, “A Feedback Control Approach for Guaranteeing

Relative Delays in Web Servers,” IEEE Real-Time Technology and Applications Symposium, June 2001. [24] C. Lu, J. A. Stankovic, G. Tao and S. H. Son, "Feedback Control Real-Time Scheduling: Framework, Model-

ing, and Algorithms," Real-Time Systems Journal, 23(1/2): 85-126, 2002. [25] J. C. Mogul, “The Case for Persistent-Connection HTTP,” SIGCOMM, pp. 299–213, Cambridge, MA, 1995. [26] V. Pai, P. Druschel and W. Zwaenepoel, “Flash: An Efficient and Portable Web Server,” USENIX Annual

Technical Conference, Monterey, CA, June 1999. [27] D. C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, and J. Walpole, "A Feedback-driven Proportion

Allocator for Real-Rate Scheduling," OSDI, Feb 1999. [28] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala, "RSVP: A New Resource ReSerVation Proto-

col," IEEE Network, September 1993.