Scalable Web Servers

8/13/2019 Scalable Web Servers

1/49

The State of the Art in Locally Distributed Web-Server Systems

VALERIA CARDELLINI AND EMILIANO CASALICCHIO

University of Roma Tor Vergata

MICHELE COLAJANNI

University of Modena

AND

PHILIP S. YU

IBM T. J Watson Research Center

The overall increase in traffic on the World Wide Web is augmenting user-perceivedresponse times from popular Web sites, especially in conjunction with special events.System platforms that do not replicate information content cannot provide the neededscalability to handle large traffic volumes and to match rapid and dramatic changes inthe number of clients. The need to improve the performance of Web-based services hasproduced a variety of novel content delivery architectures. This article will focus on Websystem architectures that consist of multiple server nodes distributed on a local area,with one or more mechanisms to spread client requests among the nodes. After years ofcontinual proposals of new system solutions, routing mechanisms, and policies (the firstdated back to 1994 when the NCSA Web site had to face the first million of requests perday), many problems concerning multiple server architectures for Web sites have been

Categories and Subject Descriptors: C.2.4 [Computer Communication Networks]:Distributed Systems; C.4 [Performance of Systems]:design studies; H.3.5 [Infor-mation Storage and Retrieval]: Online Information Servicesweb-based services

General Terms: Algorithms, Design, PerformanceAdditional Key Words and Phrases: Client/server, cluster-based architectures,dispatching algorithms, distributed systems, load balancing, routing mechanisms,World Wide Web

The first three authors acknowledge the support of MIUR-Cofin2001 in the framework of the project High-quality Web systems.

Authors addresses: V. Cardellini and E. Casalicchio, Dept. of Computer Engineering, University of RomaTor Vergata, via del Politecnico 1, Roma, I-00133, Italy, e-mail: {cardellini,ecasalicchio}@ing.uniroma2.it;M. Colajanni, Dept. of Information Engineering, University of Modena, Via Vignolese 905, Modena, I-41100,Italy, e-mail: [email protected]; P. S. Yu, IBM T. J. Watson Research Center, Yorktown Heights, NY 10598,e-mail: [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or direct commercial advantage andthat copies show this notice on the first page or initial screen of a display along with the full citation.Copyrights for components of this worked owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to useany component of this work in other works requires prior specific permission and/or a fee. Permissions maybe requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212)869-0481, or [email protected] ACM 0360-0300/02/0600-0263 $5.00

ACM Computing Surveys, Vol. 34, No. 2, June 2002, pp. 263311.


2/49

264 V. Cardellini et al.

solved. Other issues remain to be addressed, especially at the network applicationlayer, but the main techniques and methodologies for building scalable Web contentdelivery architectures placed in a single location are settled now. This article classifies

and describes main mechanisms to split the traffic load among the server nodes,discussing both the alternative architectures and the load sharing policies. To thispurpose, it focuses on architectures, internal routing mechanisms, and dispatchingrequest algorithms for designing and implementing scalable Web-server systems underthe control of one content provider. It identifies also some of the open research issuesassociated with the use of distributed systems for highly accessed Web sites.

1. INTRODUCTION

The Web is becoming the standard in-terface for accessing remote services ofinformation systems, hosting data cen-ters and application service providers.Demands placed on Web-based services

continue to grow and Web-server systemsare becoming more stressed than ever.The performance problems of Web-basedarchitecture will even worsen because ofthe proliferation of heterogeneous clientdevices, the need of client authentica-tion and system security, the increasedcomplexity of middleware and applicationsoftware, and the high availability re-quirements of corporate data centers ande-commerce Web sites. Because of the com-plexity of the Web infrastructure, perfor-mance problems may arise in many pointsduring an interaction with Web-based sys-

tems. For instance, they may occur inthe network because of congested Inter-net routers, as well as at the Web anddatabase server, either because of under-provisioned capacity or unexpected surgeof requests. Although, in recent years,both network and server capacity haveimproved and new architectural solutionshave been deployed, response time con-tinues to challenge Web-system-relatedresearch. In this article, we focus onarchitectural solutions that aim to reducethe delays due to the Web server site, as-suming that network issues are examined

elsewhere. Immediate applicability of theconsidered methodologies and increasingdemand for more complex services are themain motivations and guidelines for thissurvey focusing on the server side. Let usbriefly outline these two points.

In an Internet-based world with no cen-tralized administration, the Web site is

the only component that can be under thedirect control of the content provider. Anyother component, such as Internet back-bones, Web clients, routers and peeringpoints, DNS system, and proxy servers arebeyond the control of any Web operator.

Hence, we prefer not to consider proposalsrequiring some intervention on these com-ponents that are really hard to be applica-ble in the short-medium term, because anagreement among multiple organizationsis needed.

The other main motivation for focusingon Web system architecture is due to thegrowing complexity of Web applicationsand services. Web performance perceivedby end users is already increasingly dom-inated by server delays, especially whencontacting busy servers [Barford andCrovella 2001]. Recent measures suggest

that the Web servers contribute for about40% of the delay in a Web transaction[Huitema 2000] and it is likely that thispercentage will increase in the nearfuture because of a double effect.

A prediction made in 1995 regardingthe network bandwidth estimated thatit would triple every year for the next25 years [Gilder 1997]. This predictionseems approximately correct to someauthors [Gray and Shenoy 2000], whileothers are more prudent [Coffman andOdlyzko 2001]. Independently of thisdebate, the network capacity is im-

proving faster than the server capacity,that the Moore law estimates to doubleevery 18 months. Other improvementson the network infrastructure, such asprivate peering agreements betweenbackbone providers, the deployment ofGigabit wide-area networks, the rapidadoption of ISDN networks, xDSL

ACM Computing Surveys, Vol. 34, No. 2, June 2002.


3/49

The State of the Art in Locally Distributed Web-Server Systems 265

lines, and cable modems contribute toreduce network latency.

The relevance of dynamic and en-crypted content is increasing. Indeed,the Web is changing from a simplecommunication and browsing infras-tructure for getting static informationto a complex medium for conductingpersonal and commercial transactionsthat require dynamic computation andsecure communications with multipleservers through middleware and ap-plication software. A Web server thatprovides dynamic or secure contentmay incur a significant performancepenalty. Indeed, the generation of dy-namic content can consume significantCPU cycles with respect to the serviceof static content (e.g., Challenger et al.[1999]), while the management ofdata encryption which characterizese-commerce applications can be ordersof magnitude more expensive thanthe provisioning of insecure content[Apostolopoulos et al. 2000b]. The pro-liferation of heterogeneous client de-vices, the need of data personalization,client authentication, and systemsecurity of corporate data centers ande-commerce sites place additional com-putational load on Web servers. Indeed,it is often necessary to establish adirect communication between clientsand content providers that cachinginfrastructures and content deliverynetworks cannot easily bypass. Cachingis a very effective solution to reduce theburden on Web sites providing mainlystatic content, such as text, graphic,and video files, while it is less effectivefor applications that generate dynamicand personalized information, althoughthere is some attempt to address thisissue [Akamai Tech. 2002; Challengeret al. 2001; Oracle 2002; PersistenceSoftware 2002; Zhu and Tang 2001].

With the network bandwidth increas-ing about twice faster than the servercapacity, the increased percentage of dy-namic content of Web-based systems, theneed of a direct communication channelbetween clients and content providers, the

server side is likely to be the main futurebottleneck.

1.1. Scalable Web-Server Systems

Web-site administrators constantly facethe need to increase server capacity. Inthis article, Web system scalability is de-fined as the ability to support large num-bers of accesses and resources while stillproviding adequate performance.

The first option used to scale Web-server systems is to upgrade the Webserver to a larger, faster machine. Thisstrategy, referred to as hardware scale-up[Devlin et al. 1999], simply consists inexpanding a system by incrementally

adding more resources (e.g., CPUs, disks,network interfaces) to an existing node.While hardware scale-up relieves short-term pressure, it is neither a long-termnor a cost-effective solution, consideringthe step growth in the client demand curvethat characterizes the Web (the number ofonline users is growing at about 90% perannum).

Many efforts have also been directedat improving the performance of a Webserver node at the software level, namely

software scale-up. This includes improvingthe servers operating system (e.g., Banga

et al. [1998], Hu et al. [1999], Nahum et al.[2002], and Pai et al. [2000]), buildingmore efficient Web servers (e.g., Pai et al.[1999] and Zeus Tech. [2002]), and imple-menting different scheduling policies ofrequests (e.g., Bansal and Harchol-Balter[2001] and Crovella et al. [1999]). Pai et al.[2000] have proposed a unified I/O buffer-ing and caching system for general oper-ating systems that supports zero-copy I/O.To improve the performance of the ApacheWeb server, Hu et al.[1999] have proposedsome techniques that reduce the numberof system calls in the typical I/O path.

Nahum et al. [2002] have analyzed how ageneral-purpose operating system and thenetwork protocol stack can be improved toprovide support for high-performing Webservers. As far as concerns proposals formore efficient Web server software, theFlash Web server [Pai et al. 1999] ensuresthat its threads and processes are never



4/49


Fig. 1. Architecture solutions for scalable Web-server systems.

blocked by using an asymmetric multipro-cess event-driven architecture, while theZeus Web server [Zeus Tech. 2002] usesa small number of single-threaded I/Oprocesses, where each one is capable ofhandling thousands of simultaneous con-nections. Some literature has focused onnew scheduling policies for client requests;for example, Bansal and Harchol-Balter[2001] have proposed the use of theShortest Remaining Processing Time firstpolicy to reduce the queuing time at the

server.Improving the power of a single server

will not solve the Web scalability problemin a foreseeable future. Another solution tokeep up with ever increasing request loadand provide scalable Web-based services isto deploy a distributed Web system com-posed by multiple server nodes that canalso exploit scale-up advancements. Theload reaching this Web site must be evenlydistributed among the server nodes, soas to improve system performance. There-fore, any distributed Web system mustinclude some component (under the con-

trol of the content provider) that routesclient requests among the servers with thegoal of load-sharing maximization. Theapproach in which the system capabilitiesare expanded by adding more nodes, com-plete with processors, storage, and band-widths, is typically referred to asscale-out[Devlin et al. 1999]. We further distin-

guish between local scale-out when the setof server nodes resides at a single networklocation, and global scale-out when thenodes are located at different geographicallocations. Figure 1 summarizes the differ-ent approaches to achieve Web-server sys-tem scalability.

This survey presents and discusses thevarious approaches for managing locallydistributed Web systems (the focused top-ics are written in bold in Figure 1). Wedescribe a series of architectures, routing

mechanisms, and dispatching algorithmsto design local Web-server systems andidentify some of the issues associated withsetting up and managing such systemsfor highly accessed Web sites. We exam-ine how locally distributed architecturesand related management algorithms sat-isfy the scalability and performance re-quirements of Web-based services. We alsoanalyze the efficiency and the limitationsof the different solutions and the trade-off among the alternatives with the aim ofidentifying the characteristics of each ap-proach and their impact on performance.

1.2. Basic Architecture and Operations

In this survey, we refer either to clientorWeb browseras to the software entity thatacts as a user agent and is responsiblefor implementing all the interactions withthe Web server, including generating the



5/49


Fig. 2. Model architecture for a locally distributed Web system.

requests, transmitting them to the server,receiving the results from the server, andpresenting them to the user behind theclient application.

A scalable Web-server system needs toappear as a single host to the outsideworld, so that users need not be con-cerned about the names or locations of

the replicated servers and they can in-teract with the Web-server system as ifit were a single high-performance server.Hence, the basic system model must ad-here to the architecture transparency re-quirement by providing a single virtual in-terface to the outside world at least at thesite name level. (Throughout this survey,we will use www.site.org asanamefortheWeb site.) This choice excludes from ouranalysis widely adopted solutions, suchas the mirrored-server system that letsusers manually select alternative namesfor a Web site, thereby violating the trans-

parency requirement.Unlike the users, the client applica-

tion may be aware of the effects of somemechanism used to dispatch requestsamong the multiple servers. However, inour survey, we assume that the clients donot need any modification to interact withthe scalable Web system.

From these premises, the basic Web-system architecture considered in thissurvey consists of multiple server nodes,grouped in a local area with one or moremechanisms to spread client requestsamong the nodes and, if necessary, one ormore internal routing devices. Each Webserver can access all site information,

independently of the degree of contentreplication. The Web system requires alsoone authoritative Domain Name System(DNS) server for translating the Web-sitename into one or more IP address(es).This name server typically belongs to anetwork different from the LAN segmentused for the Web servers. A high-levelview of the basic architecture is shownin Figure 2. A router and other networkcomponents belonging to the Web systemcould exist in the way between the systemand the Internet. Moreover, it has to benoted that the system architecture for a

modern Web site consists also of back-endnodes that typically act as data serversfor dynamically generated information.The main focus of this survey is onthe Web server layer, while the tech-niques concerning content distributionin the back-end layer are outlined inSection 9.



6/49


We analyze now the main phases toserve a user request to a Web site. A userissues one request at a time for a Web

page, which is intended to be rendered tothe user as a single unit. Nevertheless,each user request (click) causes multipleclient-server interactions. There are manytypes of Web pages, but typically a Web

page is a multipart document consistingof a collection of objects that may be astatic or dynamically generated file. Anobject could be provided by the first con-tacted Web site or even by another site(e.g., banners). Since in this brief descrip-tion we cannot cover all possible instances,we focus on the most common example ofWeb page. It consists of a base HTML file

describing the page layout and a numberof objects referenced by the base HTMLfile hosted on the same Web site. AstaticWeb object is a file in a specific format(e.g., an HTML file, a JPEG image, a Javaapplet, an audio clip), which is address-able by a single URL (e.g., http://www.site.org/pub/index.html). A dynamicWeb objectrequires some computation onthe server side and, possibly, some addi-tional information from the user.

A URL has two main components: thesymbolic name of the server that housesthe object (e.g.,www.site.org) and the ob-

jects path name (e.g., /pub/index.html).To retrieve all the Web objects compos-ing a Web page, a browser issues multi-ple requests to the Web server. We refer toa Web transaction as the complete inter-action that starts when the user sends arequest to a Web site and ends when hisclient receives the last object related to therequested URL. Asessionis a sequence ofWeb transactions issued by the same userduring an entire visit to a Web site.

Summing up, a Web transaction startswhen the user makes a request to a Webserver, by either typing a URL or clicking

on a link, for example, http://www.site.org/pub/index.html. First, the client(browser) extracts the symbolic site name(www.site.org) from the requested URLand asks, through the resolver, its local-domain name server to find out the IPaddress corresponding to that name. Thelocal name server obtains the IP ad-

dress by eventually contacting some in-termediate name servers or a well-knownroot of the domain name server hierar-

chy, and ultimately querying the sitesauthoritative name server. The resolverreturns the obtained IP address to theclient that can establish a TCP connectionwith the server or device corresponding tothat IP address. After the completion ofthe TCP connection, the client sends theHTTP request (using the GET method) for/pub/index.html to the Web server thatsends the requested object back. Once ob-tained the base HTML page from the Webserver, the client parses it. If the clientfinds that embedded objects are related tothe base page, it sends a separate request

for each object. Depending on the HTTPprotocol used for the client/server interac-tions, the subsequent requests can use thesame TCP connection (HTTP/1.1) or dif-ferent TCP connections (HTTP/1.0) to thesame server.

1.3. Request Routing Mechanisms and

Scope of This Article

We have seen in the previous section thateach client request for a Web page involvesseveral operations even when the Web siteis hosted on a single server. Moreover, the

basic sequence for a Web transaction seenin the previous section can be altered byseveral factors, such as caching of the IPaddress corresponding to the site name atthe client browser or at some intermediatename servers, the version of HTTP proto-col being used in the client/server inter-action (i.e., support or not for persistentTCP connections), the presence of the re-quested Web object either in the client lo-cal cache or in intermediate cache serverslocated on the path between the client andthe Web server.

When the Web site is hosted on multiple

servers, another alternative is to decidewhich server of the distributed Web sys-tem has to serve a client request. This de-cision might occur in several places alongthe request path from the client to theWeb site. We identify four possible levelsfor deciding how to route a client requestto one server of the locally distributed



7/49


Web-server system. All the cited compo-nents are shown in Figure 2.

At the Web client level, the entity re-sponsible for the request assignmentcan be any client that originates arequest.

At the DNS level, during the addressresolution phase, the entity in charge ofthe request routing is primarily the au-thoritative DNS server for the Web site.

At the networklevel, the client requestcan be directed by router devices andthrough multicast/anycast protocols.

At the Web system level, the entity incharge for the request assignment canbe any Web server or other dispatchingdevice(s) typically placed in front of theWeb site architecture.

Only a subset of the presented alter-natives are considered in this article. In-deed, the basic premise of this survey isthe compatibility of all proposed solutionswith existing Web standards and proto-cols so that any considered architecture,algorithm, and mechanism could be im-mediately adopted without any limitingassumption and without requiring mod-ifications to existing Internet protocols,Web standards, and client code. There-

fore, we will focus on dispatching solutionsthat occur at system components that areunder the direct control of the contentprovider, that is, the authoritative DNS,the Web servers, and some internal de-vices of the Web system. On the otherhand, we do not consider client-based rout-ing mechanisms [Baentsch et al. 1997;Mosedale et al. 1997; Vingraleket al. 2000;

Yoshikawa et al. 1997] because they mayrequire some modifications to the clientsoftware [Cardellini et al. 1999]. In addi-tion, we do not investigate dispatching so-lutions at the network level that are mean-

ingful when the multiple nodes of the Website architecture are distributed over a ge-ographical area.

It is worth observing that the design andimplementation of a Web-site architectureconsisting of multiple servers is not theonly way to improve response time as per-ceived by the user. Basically, there are two

important categories for which we providejust some references for additional read-ing. They are external caching and out-

sourcingsolutions.Probably, the most popular approach for

reducing latency time and Web traffic isbased on caching of the Web objects thatmight occur at different levels. An ac-curate examination of caching solutionswould require a dedicated survey becausethey are the first techniques proposed inliterature and can be deployed at differ-ent scales: from server disk caching toproxy caching to browser caching withall possible combinations [Luotonen 1997;Wessels 2001]. In this paper, we considersome internal caching techniques occur-

ring inside the Web-server system, such asWeb-server accelerators [Challenger et al.2001; Song et al. 2000, 2002] and othertechniques for improving cache hit rate[Aron et al. 1999; Pai et al. 1998], while weexclude external caching solutions, suchas proxy servers and cooperative proxies[Barish and Obraczka 2000; Wang 1999;Wessels 2001], virtual servers (or reverseproxies) [Luotonen 1997], Web proxy ac-celerators [Rosu et al. 2001].

In this survey, we also exclude solu-tions where the content provider delegatesscalability for its Web-based services to

other organizations. For example, manyWeb sites contract with third-party Webhosting and colocation providers. Interest-ing research and implementation issuescome from Web-server systems that storeand provide access to multiple Web sites[Almeida et al. 1998; Aron et al. 2000;Cherkasova and Ponnekanti 2000; Luoand Yang 2001b; Wolf and Yu 2001]. Morerecently,Content Delivery Network(CDN)organizations undertake to serve requesttraffic for Web sites from caching sites atvarious Internet borders [Akamai Tech.2002; Digital Island 2002; Mirror Image

Internet 2002; Gadde et al. 2001].

1.4. Organization of This Article

The rest of this article is organized asfollows.

Section 2 discusses and classifies locallydistributed architectures for Web sites,



8/49


by distinguishingcluster-based Web sys-tems or simply Web clusters, virtual Webclusters, anddistributed Web systems.

Sections 3, 4, and 5 cover the mecha-nisms that can be used to route requestsin a Web cluster, in a virtual Web clus-ter, and in a distributed Web system,respectively.

Section 6 describes the policies for loadsharing and dispatching requests in theclass of Web cluster systems. We presenta taxonomy of the policies that havebeen developed in recent years by fo-cusing on the issues that each policyaddresses.

Section 7 classifies various academic

prototypes and commercial products ac-cording to the routing mechanism thatis used to distribute the client requestsamong the servers in a Web cluster.

Section 8 presents some extensions ofthe basic system architecture to im-prove scalability.

Section 9 outlines the problem ofWeb content placement among multiplefront-end and back-end servers, that isorthogonal to this article.

Section 10 concludes the article andpresents some open issues for futureresearch.

2. A TAXONOMY OF LOCALLY

DISTRIBUTED ARCHITECTURES

The basic premise of the proposed taxon-omy is the compatibility of all analyzed so-lutions with existing Web standards andprotocols so that any considered architec-ture, algorithm and mechanism could beimmediately adopted without any limitingassumption.

Distributed architectures can be differ-entiated depending on the name virtual-

ization being extended at the IP layer ornot. Given a set of server nodes that host aWeb site at a single location, we can iden-tify three main classes of architectures:

Cluster-based Web system (or Web clus-ter) where the server nodes mask theirIP addresses to clients. The only client-visible address is a Virtual IP(VIP) ad-

dress corresponding to one device whichis located in front of the Web serverlayer.

Virtual Web cluster where the VIP ad-dress is the only IP address visible to theclients like in a cluster-based Web sys-tem. Unlike the previous architecture,this VIP address is not assigned to asingle front-end device but shared by allthe server nodes.

Distributed Web system where the IPaddresses of the Web-server nodes arevisible to client applications.

Making visible or masking server IP ad-dresses is a key feature, because each so-lution implies quite different mechanisms

and algorithms for distributing the clientrequests among the server nodes. In par-ticular, the distributed Web system archi-tecture is the oldest solution, where therequest routing is decided by the DNSsystem with the possible integration ofother naming components. The cluster-based Web system architecture is a morerecent solution where request routing isentirely carried out by the internal en-tities of the Web cluster. We can antici-pate that, for a Web site where the servernodes are in the same location, a cluster-based system is preferable to a distributed

system because the former architecturecan provide fine-grained control on re-quest assignment, and better availabilityand security that are key requirementsfor present and future Web-based services[Hennessy 1999]. On the other hand, dis-tributed Web systems with their visibleIP-architecture are suitable to implementWeb sites where the servers are geograph-ically dispersed. These observations moti-vate the position of the survey that focuseson cluster-based Web systems, and con-siders distributed Web systems mainly forhistorical reasons.

The main components of a typicalmultinode Web system include a requestrouting mechanismto direct the client re-quest to a target server node, a dispatch-ing algorithm to select the Web servernode that is considered best suited torespond, and an executor to carry outthe dispatching algorithm and support



9/49


Fig. 3. Architecture of acluster-based Web system.

the routing mechanism. In this survey,we distinguish routing mechanisms fromdispatching policies for the three mainclasses of locally distributed Web-systemarchitectures.

2.1. Cluster-Based Web Systems

A cluster-based Web system (briefly, Webcluster) refers to a collection of server ma-chines that are housed together in a sin-gle location, are interconnected through ahigh-speed network, and present a singlesystem image to the outside. Each servernode of the cluster usually contains itsown disk and a complete operating sys-tem. Cluster nodes work collectively as asingle computing resource. Massive par-allel processing systems (e.g., SP-2) whereeach node satisfies all previous character-istics can be assimilated to a cluster-basedWeb system. In literature, some alterna-

tive terminology is used to refer to a Web-cluster architecture. One common term isWeb farm, meaning the collection of all theservers, applications, and data at a par-ticular site [Devlin et al. 1999]. We preferthe termWeb cluster, as Web farmis oftenused to denote an architecture forWeb-sitehosting or colocation.

Although a Web cluster may consist oftens of nodes, it is publicized with one sitename (e.g.,www.site.org) and one virtualIP (VIP)address (e.g.,144.55.62.18). Thus,the authoritative DNS server for the Website always performs a one-to-one map-ping by translating the site name into the

VIP address, which corresponds to the IP

address of a dedicated front-end node(s).It interfaces the rest of the Web-clusternodes with the Internet, thus making thedistributed nature of the site architecturecompletely transparent to both the userand the client application. The front-endnode, hereafter called Web switch, receivesall inbound packets that clients send tothe VIP address, and routes them to someWeb-server node. In such a way, it acts asthe centralized dispatcher of a distributedsystem with fine-grained control on clientrequests assignments.

A high-level view of a basic Web cluster

comprising the Web switch and Nserversis shown in Figure 3. It is to be noted thatthe response line does not appear here be-cause the two main alternatives will bedescribed in Section 3. The authoritativeDNS is not included in the system boxbecause it does not have any role in clientrequest routing.



10/49


Fig. 4. Architecture of avirtual Web cluster.

The Web switch can be implemented oneither special-purpose hardware devicesplugged into the network (also called Ap-plication Specific Integrated Circuit chips)or software modules running on a special-purpose or general-purpose operatingsystem. In this article, we use the term,

Web switch, to refer to the dispatching en-tity in general. This definition does not im-ply that the Web switch is a hardware de-vice that forwards frames based on linklayer addresses, or packets based on layer-3 and layer-4 information. Moreover, weprefer not to call the Web switch throughthe functionality it implements, for exam-plenetwork/server load balancer, as someliterature reports.

2.2. Virtual Web Clusters

The architecture of a virtual Web cluster is

based on a fully distributed system designthat does not use a front-end Web switch.Similarly to the previously described Web-cluster architecture, avirtual Web clusterpresents a single system image to the out-side throughout the use of a single VIPaddress. The main difference is that thisaddress is not assigned to a centralized

front-end node that receives all incomingclient requests and distributes them to atarget server. In a virtual Web cluster, the

VIP address is shared by all the servernodes in the cluster so that each nodereceives all inbound packets and filtersthem to decide whether to accept or dis-

card them. The mechanisms are describedin Section 3.3. A high-level view of a vir-tual Web cluster architecture is shown inFigure 4. It worth to note that this sys-tem removes the single point of failureand the potential bottleneck representedby the Web switch.

2.3. Distributed Web Systems

A distributed Web system consists of lo-cally distributed server nodes, whose mul-tiple IP addresses may be visible to clientapplications. A high-level view of a lo-

cally distributed architecture is shown inFigure 5. (The authoritative DNS is in-cluded into the system box because thiscomponent plays a key role in requestrouting for distributed Web systems.) Un-like the cluster-based Web system, andsimilarly to the virtual Web cluster ar-chitecture, a distributed Web system does



11/49


Fig. 5. Architecture of adistributed Web system.

not rely on a front-end Web switch. Theclient request assignment to a target Webserver is typically carried out during theaddress resolution of the Web-site name(look-up phase) by the DNS mechanism.In some systems, there is also a second-level routing that is typically carried outthrough some re-routing mechanism acti-vated by a Web server that cannot fulfill areceived request. The routing mechanismsin a distributed Web system are examinedin Section 5.

3. REQUEST ROUTING MECHANISMS FOR

CLUSTER-BASED WEB SYSTEMS

The Web switchis able to identify uniquelyeach node in the system through a pri-vate address that can be at different pro-tocol levels, depending on the architec-ture. More specifically, the server private

address may correspond to either an IPaddress or a lower-layer (MAC) address.There are various techniques to deployWeb clusters; however, the key role is al-ways played by the Web switch. For thatreason, we first classify the Web-clusterarchitecture alternatives according to theOSI protocol stack layer at which the Web

switch routes inbound packets to the tar-get server, that is, layer-4 or layer-7Webswitches. The choice of the routing mech-anism has also a big impact on dispatch-ing policies because the kind of informa-tion available at the Web switch is quitedifferent.

Layer-4 Web switches perform content-blind routing (also referred to as im-mediate binding), because they deter-mine the target server when the clientasks for establishing a TCP/IP connec-tion, upon the arrival of the first TCPSYN packet at the Web switch. As theclient packets do not reach the applica-tion level, the routing mechanism is ef-ficient but the dispatching policies areunaware of the content of the clientrequest.

Layer-7 Web switches can executecontent-aware routing (also referred toas delayed binding). The switch firstestablishes a complete TCP connectionwith the client, examines the HTTP re-quest at application level and then re-lays it to the target server. This routingmechanism is much less efficient,but it can support more sophisticated



12/49


Fig. 6. Taxonomy of cluster-based architectures.

dispatching policies. (We refer to layer-7Web switches according to the ISO/OSIprotocol layers, where the applicationlayer is the seventh. Other authors refer

to switches that perform content-awarerouting as layer-5 or application-layerswitches.)

Web cluster architectures based onlayer-4 and layer-7 Web switches can befurther classified on the basis of the dataflow between the client and the targetserver, the main difference being in thereturn way of server-to-client. Indeed, allclient requests necessarily have to flowthrough the Web switch. On the otherhand, the target server either responds di-rectly to the client (namely, one-way ar-

chitectures) or returns its response to theWeb switch, that in its turn sends the re-sponse back to the client (referred to astwo-way architectures). Figure 6 summa-rizes the taxonomy for Web clusters thatwe have examined so far. Typically, one-way architectures are more complex andmore efficient because the Web switch pro-cesses only inbound packets, while the op-posite is true for two-way architecturesbecause the Web switch has to processboth inbound and outbound packets.

3.1. Solutions Based on Layer-4 Switches

Layer-4 Web switches work at TCP/IPlevel. Since packets pertaining to the sameTCP connection must be assigned to thesame Web-server node, the client assign-ment is managed at TCP session level. TheWeb switch maintains a binding table toassociate each client TCP session with thetarget server.

Upon receiving an inbound packet, theWeb switch examines its header and de-termines, on the basis of the bits in the

flag field, whether the packet pertains toa new connection, a currently establishedone, or none of them. If the inbound packetis for a new connection (i.e., the SYN flagbit is set), the Web switch selects a tar-get server through the dispatching pol-icy, records the connection-to-server map-ping in an entry of the binding table, androutes the packet to the selected server.If the inbound packet is not for a newconnection, the Web switch looks up thebinding table to verify whether the packetbelongs or not to an existing connection. Ifit does, the Web switch routes the packet

to the server that is in charge for that con-nection. Otherwise, the Web switch dropsthe packet.

To improve Web-switch performance,the binding table is typically kept in mem-ory and accessed through a hash func-tion. Each entry contains the 4-tuple , andother information (e.g., beginning time)that may be relevant for some dispatchingalgorithm. The client information in the4-tuple is dynamic, whereas the serveraddress/port is a static information that

is known by the Web switch at the clusterinitialization time.

Layer-4 Web clusters can be classified onthe basis of the mechanism used to routeinbound packets to the target server andoutbound packets to the client. The maindifference is in the server-to-client way.In two-way architectures, both inboundand outbound packets pass through theWeb switch, whereas, in one-way architec-tures, only inbound packets flow throughthe Web switch.

3.1.1. Two-Way Architectures. In two-way

architectures, each server in the clusteris configured with a unique IP address,that is, the private address is at IP level.Both inbound and outbound packets arerewritten at TCP/IP level by the Webswitch. Figure 7 shows the packet flows.

Packet rewriting is based on the IPNetwork Address Translation approach



13/49


Fig. 7. Layer-4 two-way architecture.

[Srisuresh and Egevang 2001]. The Webswitch rewrites inbound packets by chang-ing the VIP address to the IP addressof the target server in the destinationaddress field of the packet header. Out-bound packets from the servers to clientsmust also pass back through the switch.Since the source address in the outboundpackets is the address of the server thathas served the request, the Web switchneeds to rewrite the server IP addresswith the VIP address, so as not to con-fuse the client. Furthermore, the Webswitch has to recalculate the IP and TCPheader checksums for both packet flows.To differentiate packet-rewriting opera-tions, we call double-rewriting those per-formed by two-way architectures, and

single-rewriting those carried out by one-way architectures.

3.1.2. One-Way Architectures. In one-way

architectures inbound packets passthrough the Web switch, while outboundpackets flow directly from the servers.This requires a separate high-bandwidthnetwork connection for outbound packets.Figure 8 shows an example where theoutbound packets flow back to the clientfrom the Web server 1. In this figure,

the layer of the server private addressis intentionally not specified as it can beeither at the IP layer (layer-3) or MAClayer (layer-2).

Routing to the target server can be doneby means of several mechanisms, suchas rewriting the IP destination addressand recalculating the TCP/IP checksumof the inbound packet (packet rewriting),encapsulating each packet within anotherpacket (packet tunneling), forwarding thepacket at the MAC layer (packet forward-ing). Let us describe how each mechanismworks.

3.1.2.1. Packet single-rewriting. The rou-ting to the target server is achieved byrewriting the destination IP address ofeach inbound packet: the Web switch re-places its VIP address with the IP addressof the selected server and recalculates theIP and TCP header checksum. Thus, the

private addresses of the server nodes areat the IP layer.

The difference from two-way architec-tures is in the modification of the sourceaddress of outbound packets. The Webserver, before sending the response pack-ets to the client, replaces its IP addresswith the VIP address and recalculates the



14/49


Fig. 8. Layer-4 one-way architecture.

IP and TCP header checksum [Dias et al.1996].

3.1.2.2. Packet tunneling. IP tunneling(or IP encapsulation) is a technique to en-capsulate IP datagrams within IP data-grams, thus allowing datagrams destined

to one IP address to be wrapped andredirected to another IP address [Perkins1996]. The effect of IP tunneling is totransform the old headers and data intothe payload of the new packet. The Webswitch tunnels the inbound packet to thetarget serverby encapsulating it within anIP datagram. The header of this datagramcontains the VIP address and the server IPaddress as source and destination address,respectively. The server private addressesare at the IP layer as in packet rewriting.

This mechanism requires that allservers support IP tunneling and have one

of their tunnel devices configured with theVIP address. When the target server re-ceives the encapsulated packet, it stripsthe IP header off and finds that the in-side packet is destined to the VIP addressconfigured on its tunnel device. Then, theserver processes the request and returnsthe response directly to the client.

3.1.2.3. Packet forwarding. This app-roach assumes that the Web-switch andthe server nodes must have one of theirnetwork interfaces physically linked byan uninterrupted LAN segment. Unlikepacket single-rewriting and packet tun-neling mechanisms, the server private

addresses are now at the MAC layer. (Forthis reason, packet forwarding is alsoreferred to as MAC address translation.)The same virtual IP address is sharedby the Web switch and the servers in thecluster through the use of primary andsecondary IP addresses. In particular,each server is configured with the VIPaddress as its secondary address througha loopback interface aliasing, for example,by using theifconfigUnix command.

Even if all nodes share the VIP address,the inbound packets reach the Web switchbecause, to avoid collisions, the server

nodes have disabled the Address Resolu-tion Protocol (ARP) mechanism. The Webswitch forwards an inbound packet tothe target server by writing the serverMAC address in the layer-2 destinationaddress and retransmitting the frame onthe common LAN segment. This opera-tion does not require any modification of



15/49


Fig. 9. Operations of layer-4 routing (left) and layer-7 routing (right).

the TCP/IP header of the forwardedpacket. Since all nodes share the same VIPaddress, the server receiving this packetcan recognize itself as a destination andcan respond directly to the client. (Packetforwarding is also known as direct serverreturn [Bourke 2001].) Rewriting and en-capsulation operations at TCP/IP level,which are typical of packet rewriting andtunneling mechanisms, are unnecessary.

3.2. Solutions Based on Layer-7 Switches

Layer-7 Web switches work at applica-tion layer, thus allowing content-aware re-quest distribution. The mechanisms forlayer-7 routing are more complex than

those for content-blind routing, becausethe HTTP request is inspected before anydispatching decision. To this purpose, theWeb switch must first establish a TCPconnection with the client (i.e., the three-way handshake for the TCP connectionsetup phase must be completed betweenthe client and the Web switch) and thenreceive the HTTP request at the applica-tion layer. On the other hand, a layer-4Web switch determines the target serveras soon as it receives the initial TCP SYNpacket, before the client sends out theHTTP request. Figure 9 shows the differ-

ent instants in which the request decisionis made by a Web switch that routes newconnections at layer-4 or layer-7.

Similar to layer-4 solutions, Web clus-ter architectures based on a layer-7 Webswitch can be further classified on the ba-sis of the mechanism used to send out-bound packets from server to client. If we

consider the data flow through the Webswitch, we distinguish among one-way ar-chitectures and two-way architectures.

3.2.1. Two-Way Architectures. In two-wayarchitectures, outbound traffic must passback through the Web switch, as shown inFigure 10. The proposed approaches ba-sically differ for the mechanism the Webswitch uses to route requests to the targetserver.

3.2.1.1. TCP gateway. In this architec-ture, a proxy running on the Web switch atthe application layer mediates thecommu-nication between the client and the server.This proxy maintains open a TCP per-

sistent connection with each Web server.When a request arrives, the proxy onthe Web switch accepts the client connec-tion and forwards the client request tothe target server through the TCP per-sistent connection. When the response ar-rives from the server, the proxy forwardsit to the client through the client-switchconnection.

3.2.1.2. TCP splicing. This mechanismaims to improve the TCP gateway ap-proach that is computationally expensivebecause each request/response packet

must flow up to the application layer.Similarly to TCP gateway, the Web switchmaintains a persistent TCP connectionwith each Web server. Unlike the previoustechnique, TCP splicing forwards packetsat the network layer between the networkinterface card and the TCP/IP stack. Itis carried out directly by the operating



16/49


Fig. 10. Layer-7 two-way architecture.

system which must be modified at thekernel level [Cohen et al. 1999; Maltz andBhagwat 1998; Spatscheck et al. 2000].Once the TCP connection between theclient and the Web switch has been estab-lished and the persistent TCP connectionbetween the switch and the target serverhas been chosen, the two connections arespliced (or patched) together. In such away, IP packets are forwarded from oneendpoint to the others without traversingthe transport layer up to the applicationlayer on the Web switch. Once the client-to-server binding has been determined,the Web switch handles the subsequentpackets by changing the IP and TCPpacket headers (i.e., it modifies the IP ad-dress and recalculates checksum) so thatboth the client and target server can rec-ognize these packets as destined to them.

The TCP splicing mechanism canbe also

implemented by hardware-based switches(e.g., Apostolopoulos et al. [2000a]) or by alibrary of functions defined at the socketlevel [Rosu and Rosu 2002].

3.2.2. One-Way Architectures. In one-wayarchitectures, the server nodes return out-bound packets directly to clients, without

passing through the Web switch, as illus-trated in Figure 11.

3.2.2.1. TCP hand-off. Once the Webswitch has established the TCP connec-tion with the client and selected the tar-get server, it hands off its endpoint of theTCP connection to the server, which can

communicate directly with the client [Paiet al. 1998].

The TCP hand-off mechanism remainstransparent to the client, as packets sentby the servers appear to be coming fromthe Web switch. Incoming traffic on al-ready established connections (i.e., any ac-knowledgment packet sent by the clientto the switch) is forwarded to the targetserver by an efficient module running atthe bottom of the Web switchs protocolstack.

The TCP hand-off mechanism requiresmodifications to the operating systems

of both the Web switch and the servers.However, this is not a serious limitationbecause Web clusters are special solutionsanyway. Moreover, TCP hand-off allows tohandle HTTP/1.1 persistent connectionsby letting the Web switch assign HTTPrequests in the same connection to differ-ent target servers [Aron et al. 1999]. There



17/49


Fig. 11. Layer-7 one-way architecture.

are two mechanisms to support persis-tent connections: the hand-off protocol canbe extended by allowing the Web switchto migrate a connection between servers(multiple hand-off); the first target serverforwards the request it cannot serve toa second server that sends the response

back to the client (back-end forwarding).The integration of the TCP hand-off mech-anism with the SSL protocol to providethe reuse of SSL session identifiers is anactive research area, but no result is yetavailable.

3.2.2.2. TCP connection hop. This a soft-ware-based proprietary solution imple-mented by Resonate as a TCP-basedencapsulation protocol [Resonate 2002].Once the Web switch has established theTCP connection with the client and se-lected the target server, it hops the TCP

connection to the server. This is achievedby encapsulating the IP packet in a Res-onate Exchange Protocol (RPX) packetand sending it to the server [Resonate2002]. The connection hop operates at thenetwork layer between the network inter-face card and the TCP/IP stack, thus min-imizing the latency of incoming packets.

The server can reply directly to the clientbecause it shares the same VIP address ofthe Web switch. Acknowledgment packetsand persistent session information fromclients are managed by the Web switch.

3.3. A Comparison of Routing Mechanisms

for Web Clusters

Figure 12 summarizes the classificationfor Web cluster by further detailing thetaxonomy previously shown in Figure 6.We first discuss the different solutionsfor layer-4 and layer-7 routing, and thenwe compare the two classes of routingmechanisms.

3.3.1. Layer-4 Routing Mechanisms. Anadvantage of two-way solutions is that theserver nodes may be in different LANs.The main constraint is that both inbound

and outbound traffic must flow throughthe Web switch. The consequence is thatthe Web switch must rewrite inbound aswell as outbound packets, and outboundpackets typically outnumber inboundpackets. Thus, the scalability (in termsof throughput) of Web clusters that usea two-way architecture is limited by the



18/49


Fig. 12. Detailed taxonomy of Web cluster architectures.

Web switch ability to rewrite packetsand recalculate their checksums, evenif a dedicated hardware support can beprovided for the checksum operations.It is worth noting that the TCP headerrewriting does not necessarily mean arecomputation of the entire packet check-sum because the new checksum can beobtained by subtracting the old checksum[Rijsinghani 1994].

A layer-4 Web switch that uses a one-way solution can sustain a larger through-put than two-way architecture before be-

coming the system bottleneck. Thus, thesystem performance is only constrained bythe ability of the switch to set up, look up,and delete entries in the binding table. Ifwe consider the different mechanisms forone-way architectures, we see that packetsingle-rewriting causes the same over-head as packet double-rewriting, but it re-duces switch operations because the morenumerous outbound packets are rewrittenby the Web servers and not by the Webswitch. This requires modifications to thekernel of the server operating system, butit does not seem to be a serious drawback

because Web clusters are proprietary so-lutions anyway.

Packet forwarding mechanisms aim tolimit the overhead of packet rewritingthanks to two mechanisms. The Webswitch processes only inbound packets andit works at the MAC layer that avoids ex-pensive checksum recalculations. A lim-

itation of packet forwarding is that itrequires the same network segment toconnect the Web switch and all the Webserver nodes. However, this restriction hasa very limited practical impact since theWeb switch and the servers are likelyto be connected through the same LAN.Solutions that employ packet tunnel-ing have good scalability, although lowerthan packet forwarding. They require theservers to support IP tunneling, which isnot yet a standard for current operatingsystems. However, as for the other solu-

tions based on special purpose systems,this is not considered a serious limit in thereality.

Thanks to hardware improvements andone-way mechanisms, the scalability ofWeb clusters based on layer-4 solutions isprimarily limited by the capacity of thenetwork link to Internet.

3.3.2. Layer-7 Routing Mechanisms.

Layer-7 Web switches that use one-way solutions enable the server nodesto respond directly to the clients, thusoffering higher scalability than two-way

solutions. In particular, the TCP hand-offapproach scales better than TCP splicing[Aron et al. 2000]. However, one-waysolutions operating at layer-7 requiremodifications to the operating systemof both the Web switch and the servers.Typically, two-way architectures workalso on commercial off-the-shelf (COTS)



19/49


Table I. A Summary of Local Routing Mechanisms for Web ClustersLayer-4 two-way Layer-4 one-way Layer-7 two-way Layer-7 one-way

Dispatching content-blind content-blind content-aware content-aware

Dispatching TCP connection TCP connection HTTP request HTTP requestgranularityWeb switch in/outbound inbound in/outbound inbounddata flowRouting fast fast slowest (gateway) slow, complexcharacteristic(s) slow (splicing)System scalability high highest lowest mediumSupported net HTTP, FTP,. . . HTTP, FTP,. . . HTTP HTTPapplications

systems when combined with layer-4operations, while they may or not requirespecial purpose operating systems whencombined with layer-7 solutions (see,e.g., TCP splicing and TCP gateway,

respectively). However, as already ob-served, the research community and themarket feel acceptable to use special pur-pose solutions for Web clusters, especiallyif they guarantee better performance. Anadvantage of two-way architectures com-bined with a layer-7 Web switch is thatcaching solutions can be implemented onthe Web switch. This allows such deviceto reply directly to a request if it can besatisfied from the cache with a consequentload decrease on server nodes.

The main advantage of the TCP gate-way approach is the simplicity of its im-

plementation that can be done on anyoperating system. The main drawback isthe overhead that this solution adds onboth request and response packets flow-ing through the Web switch up to the ap-plication layer. TCP splicing reduces TCPgateway overhead, because it eliminatesthe expensive copying and context switch-ing operations that result from the use ofan application layer proxy. However, evenin this instance, the Web switch can eas-ily become the bottleneck of the cluster asit needs to modify the TCP and IP packetheaders.

3.3.3. Layer-4 vs. Layer-7 Routing. Themain advantage of layer-7 routing mech-anisms over layer-4 solutions is the pos-sibility of using content-aware dispatch-ing algorithms at the Web switch. We seein Section 6.3 that through these policiesit is possible to achieve high disk cache

hit rates, to partition the Web contentamong the servers, to employ specializedserver nodes, to assign subsequent SSLsessions to the same server, and to achievea fine grain request distribution even with

HTTP/1.1 persistent connections.On the other hand, layer-7 routing

mechanisms introduce severe processingoverhead at the Web switch to the extentthat may cause the dispatcher to severelylimit the Web cluster scalability [Aronet al. 2000; Song et al. 2000]. As an exam-ple, Aron et al. [2000] show that the peakthroughput achieved by a layer-7 switchthat employs TCP hand-off is limited to3500 conn/sec, while a software basedlayer-4 switch implemented on the samehardware is able to sustain a throughputup to 20000 conn/sec. To improve scala-

bility of layer-7 architectures, alternativesolutions for scalable Web-server systems,which combine content-blind and content-aware request distribution, have been pro-posed. They are described in Section 8.

Table I outlines the main featuresand tradeoffs of the various mecha-nisms we have discussed for Web clusterarchitectures.

4. REQUEST ROUTING MECHANISMS

FOR VIRTUAL WEB CLUSTERS

In this section, we analyze the mecha-

nisms for routing requests in a virtual Webcluster, where the architecture does notuse a front-end device with a single VIPaddress. As the name of the Web site ismapped by the DNS into the virtual IPaddress that is shared by all Web servers,the decision on client request routing isnot designated to one entity but it is fully



20/49


distributed among the nodes of the clus-ter. All inbound packets reach each Webserver that has to filter out the packet to

determine whether it is the target or not.Only one server must accept the packetsfrom the same client and the others mustrefuse them. The packet filtering occursbetween the data link layer and the IPlayer. It is carried out either by a specificdriver interposed between these two lay-ers or by the modified device driver of theserver.

The request routing is content-blind,because the target server identifies it-self only by examining the information atTCP/IP level, such as the client IP ad-dress and port. Typically, the filter imple-

mented on each server node computes ahash function on the source IP addressand sometimes also on the port number.If the hash value matches the server ownvalue, the packet is accepted; otherwise, itis discarded.

Since the same VIP address is sharedby all the server nodes, request routing oc-curs at the MAC layer. Two MAC addressassignments to the server nodes are fea-sible:unicast MAC addressand multicast

MAC address, which are described below.Request routing at the MAC layer al-

lows a virtual Web cluster to host any

IP-based service. Another advantage ofthis architectureis that it avoids thesinglepoint of failure and the potential systembottleneck represented by the Web switch.However, it has its own disadvantages too.The MAC routing mechanism, similarlyto the packet forwarding mechanism (seeSection 3.1), requires all the server nodesto be on the same subnet. But the most se-rious limitation of the virtual Web-clusterarchitecture is represented by the requestdispatching. Not only the request routingin a virtual Web cluster cannot take ad-vantage of content-aware dispatching, but

also the packet filtering based on a hashfunction is not able to adapt itself to dy-namic conditions when the client requestsunevenly load the servers. In this survey,we will not further investigate the dis-patching policies for virtual Web clustersbecause they are limited to the applicationof a hash function.

4.1. Unicast MAC Address

The unicast MAC address approach re-

quires the assignment of a unique MACaddress (namely, cluster MAC address) toall the Web server nodes in the cluster[Microsoft 2002; Vaidya and Christensen2001]. To let this mechanism operate, wemust change the unique burned-in MACaddress of the network adapter of eachserver into the cluster MAC address. Theframes addressed to the cluster MAC ad-dress are received by each server node,which filters out the incoming packetsto determine whether it is the intendedrecipient.

Since the servers share both the same IP

address and the same MAC address, intr-acluster network communications are im-possible unless a second network adapteris used for each node. In this case, theserver dedicated IP address is mapped tothe MAC address of the second adapter.

The unicast MAC address approachis the default mode of operation of theNetwork Load Balancing [Microsoft 2002],and is also used in the prototype real-ized by Vaidya and Christensen [2001].The Network Load Balancing works as afilter between the network adapter andthe TCP/IP protocol stack. It applies a

hash function on the source IP addressand the port number, and passes to theupper network layer only the packets thatare destined to the local node. The clusternodes can periodically exchange messagesto maintain a coherent view of the map-ping and for reliability monitoring, evenif this requires to add a second networkadapter to each server.

4.2. Multicast MAC Address

The alternative mode of operation of theNetwork Load Balancing [Microsoft 2002]

relies on the use of a multicast MACaddress. In this architecture, all the servernodes are configured to receive Ethernetframes via an identical multicast MAC ad-dress [Microsoft 2002]. The unicast VIPaddress of the Web site is resolved to themulticast MAC address. When the accessrouter sends an ARP request to find out



21/49


the MAC address of the VIP address, thesystem returns the MAC multicast ad-dress,and the frames addressed to this ad-

dress are received by all the servers. Eachnode inspects the client packet and oneof the servers chooses to reply to the re-quest based on some filtering mechanism.Typically, the filtering mechanism is basedon a hash function applied to the clientIP address and the port in the TCP/IPpacket. Unlike the solution adopted forthe unique MAC address mechanism, hereeach server retains its unique built-inMAC address. This is resolved to the spe-cific IP address of the server that is usedfor intracluster traffic.

The multicast MAC address mechanism

allows the use of one network adapter tohandle both client-to-cluster traffic andservers dedicated traffic. However, the useof a multicast MAC address as a responseto an ARP query for a unicast IP addressmay not be accepted by some routers (e.g.,Cisco routers). This problem can be solvedby a manual configuration of a static ARPentry within the router.

5. REQUEST ROUTING MECHANISMS FOR

DISTRIBUTED WEB SYSTEMS

In this section, we analyze the mecha-nisms for routing requests in distributedWeb systems where server IP addressesare visible to the clients because a front-end Web switch is not used. The deci-sion on client request routing occurs atthe DNS and/or server level. We ana-lyze in Section 5.1 DNS-based mecha-nisms where the routing takes place dur-ing the address resolution phase, and inSection 5.2 (re)routing mechanisms thatare carried out by the Web servers. ADNS-based mechanism is used only forthe first-level routing and is typically cen-tralized at a single entity, while mech-anisms deployed at the Web system aredistributed and used for the second-levelrouting or rerouting.

5.1. DNS Routing Mechanisms

DNS-based routing is the first solutionto handle multiple Web servers hosting a

Web site. Proposed around 1994, it wasoriginally conceived for locally distributedWeb systems, even if now it is com-

monly used in geographically distributedWeb systems [Kwan et al. 1995]. DNS-based routing intervenes during the ad-dress look-up phase at the beginning ofthe Web transaction when the name ofthe Web site must be mapped to one IPaddress of a component of the Web sys-tem. Through this simple mechanism, theauthoritative DNS server (A-DNS) for theWeb site can select a different server forevery address resolution request reachingit [Brisco 1995]. The A-DNS replies to ad-dress requests with a tuple , where the first entry is the IP ad-

dress of one of the nodes in the distributedWeb-server system, and the second entryis the Time-To-Live (TTL) denoting the pe-riod of validity of the hostname-addressmapping that is typically cached in thename servers along the path from the A-DNS to the local name server of the client.In fact, address caching limits the A-DNScontrol on dispatching of the client re-quests as it reduces to a small percentagethe address requests that actually needthe A-DNS server to handle address res-olution [Colajanni et al. 1998; Dias et al.1996]. (Measures on real traces indicate

that the requests under direct control ofthe A-DNS responsible for a highly ac-cessed Web site are less than 5% of thetotal requests reaching the system.) In-deed, along the resolution chain betweenthe client and the A-DNS, there are sev-eral name servers that may hold a validmapping for the site name. When thismapping is found in one of the nameservers on this path and the TTL is notexpired, the address request is resolved,thus bypassing the name resolution deci-sion provided by the A-DNS. Only addressmapping requests made after the expira-

tion of the TTL in all the name servercaches on the path reach the A-DNS. Thenumber of address look-ups resolved bythe A-DNS is further reduced because ofbrowser caching. As a consequence of bothnetwork- and client-level address caching,the DNS-based approach permits a verycoarse grain request dispatching.



22/49


To limit the effects of address cachingat the network level and allow for amore fine-grained request distribution,

the A-DNS can specify a very low valuefor the TTL. However, this approach hasits own drawbacks and limits. First of all,it increases the Internet traffic for ad-dress resolutions that might overwhelmthe A-DNS to the extent that, if notreplicated among multiple name servers,the A-DNS becomes the system bottle-neck. Moreover, if any user request needsan address resolution, the response timeperceived by users is likely to increase[Shaikh et al. 2001]. Finally, the TTL pe-riod chosen by the A-DNS does not workon browser caching and, beside that, low

TTL values might be overridden by nonco-operative intermediate name servers thatimpose their minimum TTL when the sug-gested value is considered too low.

5.2. Web Server Routing Mechanisms

Some routing mechanisms can be imple-mented also by the Web servers that candecide to (re)direct a client request to an-other node. Specifically, we consider thetriangulationmechanism implemented atthe TCP/IP layer, and HTTP redirec-

tionand URL rewriting mechanisms thatwork at the application layer. Unlike theprevious routing mechanisms that typi-cally have a centralized activation, Webserver mechanisms are distributed. In-deed, when a new request arrives from theWeb switch, each Web server can decide toserve it locally or to redirect it, althoughsome centralized coordination can be used.

5.2.1. Triangulation. The triangulationmechanism is based on packet tunneling[Aversa and Bestavros 2000], which hasbeen described in Section 3.1. When this

routing is activated, the client continuesto send packets to the first contactedserver even if the request is actually ser-viced by a different node. The first noderoutes client packets to the second serverat the TCP/IP layer, by encapsulating theoriginal datagram into a new datagram.The target node recognizes that the data-

Fig. 13. HTTP redirection.

gram has been rerouted and respondsdirectly to the client (see one-way mecha-nisms in Section 3). Subsequent packetsfrom the client pertaining to the sameTCP connection continue to reach the firstcontacted node, which reroutes them tothe target server until the connection isclosed.

5.2.2. HTTP Redirection. The HTTP pro-tocol standard, starting from version 1.0,allows a Web server to respond to a clientrequest with a 301 or 302 status codein the response header that instructs theclient to resubmit its request to anothernode [Berners-Lee et al. 1996; Fieldinget al. 1999]. The built-in HTTP redirec-tion mechanism supports only per URI-based redirection. The status code 301(Moved Permanently) specifies that therequested resource has been assigned anew permanent URI and any future refer-ence to this resource will use the returned

URI. The status code 302 (correspondingto Moved Temporarily in HTTP/1.0 andto Found in HTTP/1.1 protocol specifica-tions) notifies the client that the requestedresource resides temporarily under a dif-ferent URI. Figure 13 shows the flow ofrequests and response when HTTP redi-rection is activated.



23/49


An advantage of HTTP redirection isthat replication can be managed at amedium granularity level, down to indi-

vidual Web pages. Furthermore, HTTPredirection allows content-aware routing,because the first server receiving theHTTP request can take into account thecontent of the request when it selects an-other appropriate node.

The main drawback is that this mecha-nism consumes resources of the first con-tacted server and adds an extra round-triptime to the request processing, as ev-ery HTTP redirection requires the clientto initiate a new TCP connection withthe destination node. This extra round-trip time increases the network compo-

nent of the response time. Nevertheless,it is not automatic that a redirected pageexperiences a slower response time, be-cause the increased network time couldbe compensated by a sensible reduction inthe server response time, especially whenthe first contacted Web node is highlyloaded.

A minor drawback of HTTP redirectionis due to the fact that most Web browsersdo not handle yet redirection as speci-fied by the HTTP standard, for example,when requiring the browser to display theoriginally requested URL instead of the

redirected URL. Most browsers still dis-play and bookmark the name of the newserver to which the client has been redi-rected, thus defeating the routing mech-anism when HTTP redirection is imple-mented by system entities different fromWeb servers.

5.2.3. URL Rewriting. URL rewriting isa more recent mechanism to implementWeb server rerouting. When redirectionis activated, the first contacted nodechanges dynamically the links for theembedded objects within the requested

Web page so that they point to anothernode [Li and Moon 2001]. Such redirectionmechanism integrated with a multiple-level DNS routing technique is also usedby some Content Delivery Networks,such as Akamai Tech. [2002], Mirror Im-age Internet [2002], and Digital Island[2002].

The drawback of URL rewriting is thatit introduces additional load on the redi-recting Web node because each Web page

has to be dynamically generated in orderto contain the modified object references.Furthermore, it may cause a considerableDNS overhead, as an additional addressresolution is necessary to map the newURL into the IP address of the second tar-get node. It has been demonstrated thataddress lookup might take substantiallylonger than network round-trip time[Cohen and Kaplan 2001]. On the otherhand, URL rewriting has also someadvantages over the other solutions. Forexample, this approach can be handledcompletely at the user level, and the redi-

rection mechanism can redirect furthercommunications of a client to a specificWeb server with one request redirectiononly.

5.3. Comparison of Routing Mechanisms

for Distributed Web Systems

DNS-based routing determines the serverdestination of client requests during theaddress resolution phase that is typicallyactivated at most once for each Web ses-sion. The request distribution among theWeb nodes is very coarse because all client

requests in a session will reach the sameserver. Moreover, address caching at inter-mediate name servers and client browsersfurther limits the necessity of contactingthe A-DNS. Setting TTL to low values al-lows for a more fine-grained request dis-tribution, but it could make the A-DNS apotential system bottleneck and increasethe latency time perceived by users. Be-cause of the presence of noncooperativename servers that use their own lowerbounds for the TTL, the range of the ap-plicability of this solution is limited or notwell predictable.

Although initially conceived for locallydistributed architectures (e.g., NCSAsWeb site), DNS-based routing can scalewell geographically. The popularity of thisapproach for wide-area Web systems andfor Content Delivery Networks is increas-ing due to the seamless integration withstandard DNS and the generality of the



24/49


Table II. A Summary of Routing Mechanisms for Distributed Web SystemsDNS Triangulation HTTP redirection URL rewriting

Dispatching content-blind content-blind content-aware content-aware

Dispatching session TCP connection page/object page/objectgranularityClient/server direct triangular redirection redirectiondata flowOverhead(s) None operations at the round-trip times server operations

first contacted server address lookupsSupported net HTTP, FTP,. . . HTTP, FTP,. . . HTTP HTTPapplications

name resolution process, which worksacross any IP-based application.

A solution to DNS problems is to adda second-level routing carried out by theWeb servers through a rerouting mech-

anism operating at the TCP or HTTPlayer. The main disadvantage of trian-gulation is the overhead imposed on thefirst contacted server, as it must continueto forward client packets to the destina-tion node. That is to say, the triangulationmechanism does not allow the first serverto completely get rid of the redirected re-quests. Moreover, as triangulation is acontent-blind routing mechanism, it re-quires full content replication, and doesnot allow fine-grain dispatching when theWeb transaction is carried out through anHTTP/1.1 persistent connection.

Unlike triangulation-based solutions,application-layer mechanisms, such asHTTP redirection and URL rewriting,do not require the modification of pack-ets reaching or leaving the Web-serversystem. This allows the server to takeinto account the requested content in thedispatching decision, thus providing alsofine-grain rerouting. The HTTP redirec-tion is fully compatible to any client soft-ware; however, its use limits the Web-based service to HTTP requests only andincreases network traffic, because anyredirected request needs two TCP con-

nections prior to be serviced. There isanother trade-off when comparing URLrewriting and HTTP redirection for mul-tiple requests to the same Web site. Theformer mechanism requires additionalDNS look-ups but no request redirectionafter the first one, while the latter so-lution might save address lookup over-

heads at the price of multiple requestredirections.

Application-layer mechanisms canavoid ping-pong effects that occur whenan already rerouted request is further

selected for reassignment. A cookie can beset when the request is redirected for thefirst time, so prior to decide about reas-signment the server inspects if a cookie ispresent or not. Besides the use of cookies,other means for session identification canbe used (e.g., session encoding in URLs),either to identify browsers that do notsupport cookies or to avoid the misuseof long-living cookies. The triangulationmechanism does not suffer from theseside effects, because the destination nodecan deduce if the request has alreadybeen rerouted by simply inspecting the

source packet address.Table II outlines and compares the main

characteristics of the routing mechanismsfor distributed Web systems.

6. DISPATCHING ALGORITHMS FOR

CLUSTER-BASED WEB SYSTEMS

In this section, we describe the policiesthat can be used to select the target servernode in a cluster-based Web system. Thedispatching policy influences both perfor-mance experienced by the users and sys-tem scalability.

In a Web cluster, the dispatching pol-icy is carried out by the Web switchthat acts as a global scheduler for thesystem.1 Global scheduling policies have

1 We use the definition of global scheduling givenin Casavant and Kuhl [1988] and dispatching assynonymous.



25/49


been classified in several ways, follow-ing different criteria [Casavant and Kuhl1988; Shirazi et al. 1995; Shivaratri et al.

1992; Wang and Morris 1985]. There areseveral alternatives, such as load bal-ancing vs. load sharing, centralized vs.distributed, and static vs. dynamic algo-rithms, but only a subset of them can beactually applied to Web-cluster dispatch-ing. This architecture with a single Webswitch that receives all incoming requestsdrives the choice to centralized dispatch-ing policies. Moreover, if we consider thatthe main objective of load sharing isto smooth out transient peak overloadperiods while load balancing algorithmsstrive to equalize the loads of the servers

[Kremier and Kramer 1992; Shivaratriet al. 1992], a dispatching algorithm atthe Web switch should aim to share ratherthan to balance cluster workload. Abso-lute stability is not always necessary andsometime impossible to achieve in a highlydynamic system such as a Web clusterhosting a popular Web site.

The real alternative is therefore amongstatic vs. dynamic algorithms, althoughthe Web switch cannot use highly sophis-ticated dispatching algorithms because ithas to take immediate decisions for hun-dreds or thousands of requests per second.Static algorithms are the fastest solutionto prevent the Web switch from becomingthe primary bottleneck of the Web clus-ter because they do not rely on the cur-rent state of the system at the time of de-cision making. However, these algorithmscan potentially make poor assignment de-cisions.Dynamicalgorithms have the po-tential to outperform static algorithms byusing some state information to help dis-patching decisions. On the other hand,dynamic algorithms require mechanismsthat collect and analyze state information,thereby incurring in potentially expensive

overheads. The requirements listed belowsummarize the constraints for dispatch-ing algorithms that we will analyze in thissection.

(1) Low computational complexity, be-cause dispatching decisions are re-quired in real-time.

(2) Full compatibility with existing Webstandards and protocols.

(3) All state information needed by a dis-patching policy has to be accessibleto the Web switch. In particular, theswitch and servers of the Web clusterare the only entities that can collectand exchange load information. We donotconsider anystate information thatneeds active cooperation from othercomponents that do not belong to thecontent provider.

6.1. A Taxonomy of Dispatching Algorithms

We have seen that in Web clusters the onlypractical choice among all global schedul-

ing policies lies in the static vs. dynamicalgorithms. A third class of load sharingpolicies that has been investigated in liter-ature of distributed systems is the class ofadaptivealgorithms, where the load shar-ing policy as well as the policy parame-ters can change on the basis of systemand workload conditions [Shivaratri et al.1992]. However, to the best of our knowl-edge, no existing Web switch uses adaptivealgorithms.

Static dispatching algorithms do notconsider any state information while mak-ing assignment decisions. Instead, dy-

namic algorithms can take into account avariety of system state information thatdepends also on the protocol stack layer atwhich the Web switch operates. Becauseof the importance of this factor, we pre-fer to first classify the dispatching algo-rithms among content-blind dispatching,if the Web switch works at the TCP/IPlayer, and content-aware dispatching,if the switch works at the applicationlayer.

We then use the literature classificationby distinguishing static and dynamic al-gorithms. It is to be noted that we assume

that static algorithms aredeployed only byWeb switches that operate at the TCP/IPlayer. Indeed, the use of a sophisticatedarchitecture such as a layer-7 switch ismotivated only if its benefits are fullyexploited by the dispatching algorithm.Dynamic algorithms can be further classi-fied according to the level of system state



26/49


Fig. 14. Taxonomy of dispatching policies in Webclusters.

information being used by the Web switch.We consider the following three classes.

6.1.1. Client State Aware Policies. TheWeb switch routes requests on the basisof some client information. Layer-4 Webswitches can use onlynetwork-layerclientinformation such as client IP address andTCP port. On the other hand, layer-7 Webswitches can examine the entire HTTPrequest and make decisions on the basisof more detailed information about the

client.

6.1.2. Server State Aware Policies. TheWeb switch assigns requests on the ba-sis of some server state information, suchas current and past load condition, la-tency time, and availability. Furthermore,in content-aware dispatching, the switchcan also consider information about thecontent of the server disk caches.

6.1.3. Client and Server State Aware Policies.

The Web switch routes requests by com-bining client and server state information.

Actually, most of the existing client stateaware policies belong to this class, becausethey always use some more or less pre-cise information about the server loads (atleast server availability).

Figure 14 summarizes the taxonomy fordispatching algorithms that we have ex-amined so far. We recall that static algo-

rithms as well as server state aware poli-cies are meaningful only for content-blindWeb switches operating at the TCP/IP

layer.

6.2. Content-Blind Dispatching Policies

In this section, we describe the maincontent-blind dispatching policies accord-ing to the taxonomy shown in Figure 14,and detailed in Figure 15 with some repre-sentative algorithms for each category atthe bottom level.

6.2.1. Static Algorithms. Static policies donot consider any system state information.Typical examples areRandom andRound-

Robin algorithms. Random distributes theincoming requests uniformly through theserver nodes with equal probability ofreaching any server. Round-Robin uses acircular list and a pointer to the last se-lected server to make dispatching deci-sions, that is, if Si was the last chosennode, the new request is assigned to Si+1,wherei + 1 = (i + 1) mod N and Nis thenumber of server nodes. Therefore, Round-Robin utilizes only information on past as-signment decision.

Both Random and Round-Robin policiescan be easily extended to treat servers

with different processing capacities bymaking the assignment probabilistic onthe basis of the server capacity [Colajanniet al. 1998]. To this purpose, if Ci in-dicates the server capacity, the relativeserver capacityi(0 i 1) is defined asi = Ci/max(C), where max(C) is the max-imum server capacity among all the servernodes. It is to be noted that the server ca-pacity is a configuration parameter, thusa static information. For Random policy,heterogeneous capacities canbe taken intoaccount by assigning different probabili-ties to the servers according to their ca-

pacity. The Round-Robin policy can treatheterogeneous server nodes in the follow-ing way. A random number (0 1)is generated, and, assuming Si was thelast chosen node, the request is assignedto Si+1 only if i. Otherwise, Si+2 be-comes the next candidate and the processrecurs.



27/49


Fig. 15. Content-blind dispatching algorithms.

Different processing capacities can be

also treated by using the so-called staticWeighted Round-Robin, which comes as avariation of the Round-Robin policy. Eachserver is assigned an integer weight withat indicates its capacity. Specifically,wi = Ci/min(C), where min(C) is the mini-mum server capacity among all the servernodes. The dispatching sequence will begenerated according to the server weights[Linux Virtual Server 2002]. As an ex-ample, let us assume that S1, S2, andS3 have the weights 3, 2, and 1, respec-tively. Then, a dispatching sequence canbe S1S1S2S1S2S3.

6.2.2. Client State Aware Algorithms. Sincelayer-4 Web switches are content informa-tion blind, the type of information regard-ing the client is limited to that containedin TCP/IP packets, that is, the IP sourceaddress and TCP port numbers. Thiscoarse client information can be used bydispatchingalgorithms that statically par-tition the server nodes and assign thesame clients to the same servers, typicallythrough a hash function applied to theclient IP address.

6.2.3. Server State Aware Algorithms.Client information is immediately avail-able at the Web switch because it receivesall requests for connection. On the otherhand, when we consider dispatchingalgorithms that use some server stateinformation we have to address severalissues: which server load index? How

and when to compute it? How and when

to transmit it to the Web switch? Theseare well-known problems in networkedsystem [Dahlin 2000; Ferrari and Zhou1987]. The note in Section 6.4.2 is devotedto the analysis of some solutions in Webclusters.

Once a server load index is selected,the Web switch can apply different dis-patching algorithms. A common schemeis to have the new connection assignedto the server with the lowest load index.The Least Loaded approach denotes aclass of policies that depend on the chosenserver load index. For example, in the

Least Connections algorithm, which isusually adopted in commercial products(e.g., LocalDirector [Cisco Systems 2002],BIG-IP [F5 Networks 2002]) the Webswitch assig

Scalable Web Servers

Documents

Transcript of Scalable Web Servers