Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

296

Transcript of Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Page 1: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 2: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

A PracticalGuide to Content

Delivery Networks

Second Edition

Page 3: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 4: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

A PracticalGuide to Content

Delivery Networks

Second Edition

Gilbert Held

Page 5: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2011 by Taylor and Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-4398-3588-3 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Held, Gilbert, 1943-A practical guide to content delivery networks / Gilbert Held. -- 2nd ed.

p. cm.Includes index.ISBN 978-1-4398-3588-3 (hardcover : alk. paper)1. Computer networks. 2. Internetworking (Telecommunication) 3. Internet. I. Title.

TK5105.5.H444 2011004.6--dc22 2010030232

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

Page 6: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

v

Contents

Preface xi i i

acknowledgments xvii

chaPter1 introductiontocontentdeliverynetworking 11.1 TheModernContentDeliveryNetwork 1

1.1.1 Advantages 21.1.2 Disadvantages 4

1.2 Evolution 51.2.1 Client-ServerComputing 6

1.2.1.1 Client-to-MainframeDataFlow 61.2.1.2 ModernClient-ServerOperations 9

1.2.2 UseofVideoServers 111.2.2.1 VideoLength 111.2.2.2 VideoResolution 121.2.2.3 FrameRate 121.2.2.4 ColorDepth 121.2.2.5 DataCompression 13

1.2.3 ServerNetworkArchitecture 131.2.3.1 Two-TierArchitecture 141.2.3.2 Three-TierArchitecture 14

1.2.4 TheRoadtoPushTechnology 161.2.4.1 TeletextSystems 161.2.4.2 Videotext 17

1.2.5 PullTechnology 171.2.5.1 RoleofCaching 181.2.5.2 PullLimitations 22

Page 7: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

vi Contents

1.2.6 Multicast 231.2.6.1 Advantages 241.2.6.2 Addresses 241.2.6.3 Limitations 25

1.2.7 PushTechnology 261.2.7.1 Evolution 261.2.7.2 Crawling 271.2.7.3 Feeds 281.2.7.4 Advantages 301.2.7.5 Disadvantages 30

1.3 ContentDeliveryNetworking 321.3.1 Client-ServerOperationsontheInternet 321.3.2 ClientServerOperatingontheSame Network 331.3.3 Client-ServerOperationsonDifferent Networks 331.3.4 PeeringPoint 331.3.5 VideoConsiderations 38

chaPter2 client-servermodels 412.1 Overview 422.2 ClientOperations 43

2.2.1 URLs 432.2.1.1 AbsoluteandRelative 462.2.1.2 ShorteningURLs 47

2.2.2 HTML 472.2.2.1 Versions 472.2.2.2 HTMLDocuments 482.2.2.3 FontControl 492.2.2.4 HypertextLinks 502.2.2.5 AddingImages 502.2.2.6 AddingVideo 52

2.2.3 HTTP 562.2.3.1 Versions 562.2.3.2 Operation 562.2.3.3 HTTP1.1 592.2.3.4 StateMaintenance 61

2.2.4 BrowserPrograms 622.2.4.1 Helpers 642.2.4.2 Plug-Ins 652.2.4.3 Java 652.2.4.4 VBScript 682.2.4.5 ActiveX 69

2.3 ServerOperations 702.3.1 Evolution 702.3.2 CommonWebServerPrograms 71

2.3.2.1 ServerCharacteristics 71

Page 8: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Contents vii

2.3.3 ApplicationServers 742.3.3.1 Access 742.3.3.2 JavaApplicationServers 752.3.3.3 GeneralServerTools 762.3.3.4 Microsoft’s.NETFramework 77

2.4 DistanceRelationship 782.4.1 UsingPing 782.4.2 UsingTraceroot 80

chaPter3 understandingtcP/iP 833.1 TheTCP/IPProtocolSuite 83

3.1.1 ProtocolSuiteComponents 833.1.2 PhysicalandData-LinkLayers 84

3.1.2.1 MACAddressing 853.1.2.2 Layer3Addressing 853.1.2.3 ARP 88

3.1.3 TheNetworkLayer 883.1.3.1 IPHeader 89

3.1.4 TheTransportLayer 913.1.4.1 TCP 913.1.4.2 UDP 933.1.4.3 PortMeanings 94

3.2 TheDomainNameSystem 953.2.1 NeedforAddressResolution 963.2.2 DomainNameServers 963.2.3 Top-LevelDomain 973.2.4 DNSOperation 983.2.5 ConfiguringYourComputer 983.2.6 RootNameServers 1003.2.7 TheNSLookupTool 1013.2.8 ExpeditingtheNameResolutionProcess 1023.2.9 DNSResourceRecords 103

3.2.9.1 SOAResourceRecord 1033.2.9.2 NameServer(NS)Records 1043.2.9.3 Address(A)records 1043.2.9.4 HostInformation(HINFO) Record 1043.2.9.5 MailExchange(MX)Records 1053.2.9.6 CanonicalName(CNAME) Records 1053.2.9.7 OtherRecords 105

chaPter4 thecdnmodel 1074.1 WhyPerformanceMatters 107

4.1.1 EconomicsofPoorPerformance 1084.1.2 Predictability 1094.1.3 CustomerLoyalty 1104.1.4 Scalability 111

Page 9: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

viii Contents

4.1.5 Flexibility 1124.1.6 CompanyPerception 1124.1.7 Summary 113

4.2 ExaminingInternetBottlenecks 1134.2.1 EntryandEgressConsiderations 1134.2.2 AccessDelays 1144.2.3 EgressDelays 1214.2.4 BenefitsofEdgeServers 1244.2.5 PeeringPoints 125

4.2.5.1 Rationale 1254.2.5.2 PeeringandTransitOperations 1264.2.5.3 TransitandPeeringOperations 1304.2.5.4 GlobalStructureofPeeringPoints 1344.2.5.5 RepresentativePeeringPoints 1354.2.5.6 PeeringPointDelays 144

4.3 EdgeOperations 1484.3.1 CDNOperation 1494.3.2 TheAkamaiNetwork 149

4.3.2.1 TypeofContentSupport 1504.3.2.2 CentralizedWebSiteAccess 1504.3.2.3 EdgeServerModel 1514.3.2.4 Limitations 153

4.3.3 EdgeSideIncludes 1534.3.3.1 ESISupport 1564.3.3.2 InclusionandConditionalInclusion 1574.3.3.3 EnvironmentalVariables 1574.3.3.4 ExceptionandErrorHandling 1574.3.3.5 LanguageTags 1584.3.3.6 TheESITemplate 158

4.3.4 EdgeSideIncludesforJava 1594.3.5 Statistics 1604.3.6 Summary 161

4.4 TheAkamaiHDNetwork 1614.4.1 UsingtheHDNetworkwithFlash 162

4.4.1.1 SelectingtheClientPopulation 1634.4.1.2 SelectingBitRates 1634.4.1.3 SelectingFrameSizes 1634.4.1.4 Profiles 1644.4.1.5 Levels 1654.4.1.6 Keyframes 165

chaPter5 cachingandloadBalancing 1675.1 Caching 167

5.1.1 BrowserCache 1685.1.2 OtherTypesofWebCaches 169

5.1.2.1 ProxyCaches 1695.1.2.2 GatewayCaches 1725.1.2.3 ServerCaches 173

Page 10: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Contents ix

5.1.3 ApplicationCaching 1735.1.4 CacheOperation 1745.1.5 CacheControlMethods 175

5.1.5.1 METATags 1755.1.5.2 HTTPHeaders 1785.1.5.3 Cache-ControlHeader 1805.1.5.4 DirectiveApplication 1825.1.5.5 Cache-RequestDirectives 1825.1.5.6 Cache-ResponseDirectives 185

5.1.6 WindowsDNSCachingProblems 1875.1.7 ViewingHTTPHeaders 1875.1.8 ConsideringAuthentication 1915.1.9 EnhancingCacheability 191

5.2 LoadBalancing 1945.2.1 TypesofLoadBalancing 1945.2.2 Rationale 1955.2.3 LoadBalancingTechniques 195

5.2.3.1 DNSLoadBalancing 1965.2.3.2 LoadBalancingMethods 197

5.2.4 HardwareversusSoftware 1985.2.5 DNSLoadBalancing 1995.2.6 DNSLoad-SharingMethods 199

5.2.6.1 UsingCNAMES 2005.2.6.2 UsingARecords 200

5.2.7 ManagingUserRequests 2015.2.7.1 HiddenFields 2025.2.7.2 Settings 2025.2.7.3 URLRewriting 203

chaPter6 thecdnenterPrisemodel 2056.1 Overview 205

6.1.1 Rationale 2066.1.1.1 ConcentratedCustomerBase 2076.1.1.2 DistributedLocationsAvailable

forUse 2076.1.1.3 KnowledgeableStaff 2086.1.1.4 Control 2086.1.1.5 Economics 209

6.1.2 Summary 2096.2 TrafficAnalysis 210

6.2.1 UsingWebLogs 2106.2.1.1 ApacheAccessLogs 2116.2.1.2 AccessRecords 2126.2.1.3 HTTPResponseCodes 212

6.2.2 UsingLoggingStrings 2146.2.3 Web-LogAnalysis 2156.2.4 TopReferringDomains 217

Page 11: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

x Contents

6.2.5 ConsideringStatusCodes 2186.2.6 Web-LogStatistics 2206.2.7 ReverseMapping 2216.2.8 SOARecordComponents 2236.2.9 OriginationCountry 2266.2.10 OriginatingTimeZone 2266.2.11 OtherStatistics 2276.2.12 OtherAnalysisTools 2286.2.13 Cookies 232

6.2.13.1 CookieBasics 2336.2.13.2 WritingCookies 2356.2.13.3 HowaCookieMovesData 2356.2.13.4 HowWebSitesUseCookies 2366.2.13.5 ProblemswithCookies 237

6.2.14 OtherLoggingInformation 2386.2.15 Microsoft’sPerformanceMonitor 238

6.2.15.1 ActivatingPerformanceMonitor 2396.2.15.2 AddingCountersandInstances 2406.2.15.3 WorkingwithPerformance

Monitor 2426.2.15.4 Summary 244

6.2.16 UsingaNetworkAnalyzer 2466.2.17 OtherToolstoConsider 248

6.3 ContentDeliveryModels 2496.3.1 Single-Site,Single-ServerModel 249

6.3.1.1 Advantages 2496.3.1.2 Disadvantages 2506.3.1.3 ConsideringServerOptions 2516.3.1.4 ConsideringNetworkOperations 251

6.3.2 Single-Site,Multiple-ServerModel 2526.3.2.1 Advantages 2526.3.2.2 Disadvantages 252

6.3.3 Multiple-Sites,Single-ServerperSiteModel 2546.3.3.1 Advantages 2546.3.3.2 Disadvantages 255

6.3.4 Multiple-Site,Multiple-ServerperSiteModel 2566.3.4.1 Advantages 2576.3.4.2 Disadvantages 257

6.3.5 AnIn-BetweenModel 257

chaPter7 weB-hostingoPtions 2597.1 Rationale 259

7.1.1 CostElementsandTotalCost 2607.1.2 PerformanceElements 2637.1.3 Server-SideLanguageSupport 2667.1.4 Web-ServiceTools 2667.1.5 TheImportanceofImages 266

Page 12: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Contents xi

7.1.6 Back-EndDatabaseSupport 2697.1.7 FacilityLocation(s) 269

7.2 TypesofWeb-HostingFacilities 2707.2.1 DedicatedHosting 2707.2.2 SharedServerHosting 2717.2.3 ColocatedHosting 272

7.3 EvaluationFactors 273index 277

Page 13: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 14: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

xiii

Preface

ThedevelopmentoftheWorldWideWebhashadaconsiderableeffectuponhowwepurchaseitems,reviewfinancialinformationfromthecomfortofourhomesandoffices,andperformnumerouswork-relatedactivities. Today, it is common for people to check their stock andbondportfoliosonline,usethefacilitiesofaprice-comparisonWebsitepriortoinitiatingawork-relatedpurchase,checkthelatestresultsoftheirfavoritesportsteam,examineweatherpredictionsforapos-siblevacationgetaway,andperhapsevenbookairfare,hotel,andacarrental online. Although many people consider such activities to beconfinedtotheuseofpersonalcomputers(PCs),withinthepastfewyears theApple iPhone,BlackberryStorm2,andother“smart”cellphonesincorporatingbrowsersoftwarenowallowtheseactivitiestobeperformedwhileonthego.Inaddition,anewgenerationofsmallWi-Fidevices,ranginginsizefrommini-notebooksthatweighafewpoundstodevicesthatcanfitinyourshirtpocket,enabletheseactivi-tiestobeperformedfromcoffeeshops,sandwichshops,airports,andhotelroomsandlobbieswithoutincurringthecostofmonthlydataplans.Whilewenowconsidersuchactivitiesasanormalpartofourday,whatmanyreadersmaynotrealizeisthatourabilitytoperformsuchactivitiesinatimelyandefficientmannerresultsfromahiddennetwork within the Internet. That network and its operation andutilizationarethefocusofthisbook.

Page 15: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

xiv PrefaCe

Sincethepublicationofthefirsteditionofthisbook,thegrowthintheuseofvideoovertheInternethasbeennothingshortofastounding.TodayyoucanviewthehighlightsofDancing with the Stars,American Idol,andotherrealitytypesofprograms;watchyourfavoriteTVpro-gram;downloadamovie;postvideostomanysocial-networkingsites;and, of course, view hundreds of thousands of videos on YouTubeandsimilarsites.Giventheimportanceofvideo,thisauthorwouldberemissifhedidnotincludeinformationinthisneweditionshow-ing how organizations can incorporate so-called new media intoWeb pagesaswellashowtheycanusecontentdeliverynetworkstoassistinthedeliveryofthesepages.

Inthisbook,wewillexaminetheroleofcontentdeliverynetwork-ingtofacilitatethedistributionofvarioustypesofWebtraffic,rang-ingfromstandardWebpagestostreamingvideoandaudioaswellasothertypesoftraffic.Becausecontentdeliverynetworkingoperationsarenormallyperformedbyindependentorganizations,theappropri-ateuseoffacilitiesoperatedbydifferentvendorsrequiresknowledgeofhowcontentdeliverynetworksoperate.Thus,inthisbook,wewilldescribeanddiscusshowsuchnetworksoperateaswellastheadvan-tagesanddisadvantagesassociatedwiththeirutilization.Inaddition,because an appreciationof contentdeliverynetworking requires anunderstandingofWebarchitectureand theTCP/IP(TransmissionControlProtocol/InternetProtocol)suite,wewillalsoexaminebothtopics. Understanding Web architecture, including the relationshipbetweenWebclients,servers,applicationservers,andback-enddata-bases,willprovideuswiththeknowledgerequiredtoensurethattheuseof a contentdeliverynetwork satisfies all of ourorganizationalrequirements.ByunderstandingtheTCP/IPprotocolsuiteandthemanner by which different applications are both transported andidentified,wecanobtainabaseofknowledgethatenablesustoappre-ciatehowcontentdeliverynetworksoperate.

Becausewecanlearnfromthepast,thisauthorbelievesthattheevo-lutionoftechnologyrepresentsanimportantaspectofanytechnology-relatedbook.Thus,wewillexaminethedevelopmentofavarietyoftechnologiesthathaveevolvedoverthepastdecadeasmechanismstodistributevarioustypesofWebcontent.Byunderstandingtheadvan-tages and disadvantages associated with different types of contentdistributiontechnologies,wewillobtainanappreciationforhowwe

Page 16: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

PrefaCe xv

canusethenetworkswithintheInternetoperatedbyvariousInternetserviceproviders(ISPs)asamechanismtodeliverourorganization’sWeb-server-basedinformationinatimelyandefficientmannertobothactualandpotentialuserslocatedaroundtheglobe.Last,butcertainlynotleast,wewillexaminehowwecancodeWebpagestofacilitatethedeliveryofvarioustypesofmediainaneffectivemanner.Indoingso,wewillperiodicallyrefreshourknowledgeofHTMLcodingaswellasturnourattentiontothenumberofbytesrequiredtodelivercertaintypesofmedia.ManycellphoneoperatorsaswellasWi-Fihotspotoperatorsareplacinglimitsontheamountofdatathatcanbedeliveredforabasicmonthlyfee.WithISPsnowbillingcompaniesbasedupontrafficgenerated,additionalusageiscostingconsiderablymore.Thus,itisbecomingbeneficialtooptimizethedeliveryofcontentforboththerecipientandthegeneratorofsuchinformation.

Asaprofessionalauthorwhowritesontechnology-relatedtopics,I  am interested in and highly value reader feedback. You can writemeeitherviamypublisher,whosemailingaddressisinthisbook,[email protected] meknowifIspenttoomuchortoolittleeffortcoveringaparticulartopic,ifIshouldhaveincludedanotheraspectofcontentdeliverynetwork-ing in this book, or share any other comments you wish with me.BecauseIfrequentlytravel,Imaynotbeabletorespondtoyouover-night,butIwillmakeeveryefforttorespondtoyourcommentswithinareasonableperiodoftime.ManypreviouscommentsandsuggestionsconcerningotherbooksIhavewrittenhavemadetheirwayintosubse-quenteditions,it’squitepossiblethatyourcommentswillhavearoleinshapingthescopeofcoverageofafutureeditionofthisbook.

Gilbert HeldMacon, Georgia

Page 17: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 18: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

xvii

Acknowledgments

Thecreationofthebookyouarereadingrepresentsateameffort,eventhoughthereisonlythenameofthisauthoronitsbindingandcover.From the acceptance of a proposal to the creation of a manuscript,fromtheproofingofthemanuscripttotheprintingofgalleypages,andfromthecorrectionofgalleypagetyposanderrorstothecreationofcoverartandtheprintingofthisbook,manyindividualshavecon-tributedaconsiderableamountoftimeandeffort.Iwouldberemissif Ididnotacknowledge theeffortof severalpersonsaswellas theTaylor &FrancisGroup’spublicationteamthatresultedinthebookyouarereading.Onceagain,IamindebtedtoRichO’Hanley,pub-lisherofTaylor&Francis’InformationTechnologyDivision,foragree-ingtobackanotheroneofthisauthor’sresearchandwritingprojects.

Due to a considerable amount of travel, this author many yearsagorealizedthatitwaseasiertowriteabooktheold-fashionedway,usingpenandpaper,ratherthanattempttousealaptopornotebook.Penandpaperweremorereliableinthefaceofcircular,rectangular,square,andotheroddballelectricalreceptaclesthatcouldvaryfromonesideofacitytoanother.Theuseofafewpensandacoupleofwritingpadswaspreferable to theuncertaintyof theavailabilityofsuitable electricalplugs topower aportable computer.Once again,thisauthorisindebtedtohiswifeBeverlyforherfineeffortincon-vertinghishandwrittenchaptersintoaprofessionalmanuscript.

Page 19: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

xviii aCknowledgments

Inconcludingthisseriesofacknowledgments,Iwouldliketotakethe opportunity to thank all of the behind-the-scenes workers atAuerbachPublishersandTaylor&FrancisGroup.Fromthecreationofgalleypagestoprinting,binding,andcoverart,Itrulyappreciateallofyourefforts.

Page 20: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

1

1IntroductIon to content

delIvery networkIng

Thepurposeofanyintroductorychapteristoacquaintreaderswiththetopicortopicscoveredbyabook,andthischapterisnoexception.Commencingwithadefinitionofacontentdeliverynetwork(CDN),we will describe and discuss its evolution. In doing so, we willexamine several types of networking technologies that were devel-opedtodeliverspecifictypesofcontentaswellastherationaleforthedevelopmentofthemoderncontentdeliverynetwork.

1.1   The Modern Content Delivery Network

The modern content delivery network can be defined very simplyas follows: A content delivery network represents a group of geo-graphically dispersed servers deployed to facilitate the distributionofinformationgeneratedbyWebpublishersinatimelyandefficientmanner.Ofcourse,thisdefinitiondoesnotmentionthecapacityofdispersed servers.Less thanadecade ago, the transferofdigitizedvideowasinitsinfancy.Today,YouTubeandothersitesaremakingthe use of video commonplace. When you add in the growth ofsmartphonesandtheirdatadelivery—which,accordingtoanarticleappearingintheTechnologysectionoftheOctober30,2009,editionoftheWall Street Journal,entitled“UnravelingIn-BuildingWirelessNetworks,”AT&Texperiencedagrowthofdataonitsnetworkthatexpandedby4,932%betweenthethirdquarterof2006andthesecondquarterof2009—thecapacityofserversbecomesextremely impor-tant.Thisisasubjectwewillrevisitnumeroustimesinthisbook.

Although the prior definition of a content delivery network issimplistic,ittellsusasignificantamountofinformationaboutwhataCDNrepresents.Thatis,aCDNisagroupofserversthatfacili-tatethedistributionofinformationgeneratedbyWebpublishersina

Page 21: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

2 aPraCtiCalguidetoContentdeliverynetworks

timelyandefficientmanner.Toaccomplishthistask,theserversmustbelocatedclosertotheultimateconsumerorpotentialclientoperatorthantheserveroperatedbytheWebpublisher.

Although there are several third-party organizations that offercontentdeliveryservices,it’salsopossiblefororganizationstobuildtheirownnetwork. Indoing so, they caneitherposition servers atmultiplesitesorusethird-partydatacenterstocreatetheirowncon-tent delivery network. Thus, there are certain trade-offs associatedwiththedevelopmentanduseofacontentdeliverynetworkthatmustbeconsidered,andwewillexaminetheseinthisbook.

1.1.1 Advantages

There are two key advantages associated with the modern contentdeliverynetwork.Bothadvantagesresultfromthefactthatmakingmultiple copies of the content of a Web publisher and distributingthat content across the Internet removes the necessity of customerrequests traversing a large number of routers to directly access thefacilities of the Web publisher. This in turn reduces traffic routedthroughtheInternetaswellasthedelaysassociatedwiththeroutingoftraffic.Thus,acontentdeliverynetworkcanbeexpectedtoreducelatencybetweenclientandserver.Thiscanbecomeasignificantissueforthedeliveryofreal-timevideotoclients;however,it’simportanttorealizethatmostvideowewatchontheInternetispresentlybeingbuffered,andthedelayresultingfrombufferingpermitsarelativelysmoothdeliverytooccur.Becauseithasbeenoveradozenyearssinceorganizationsbeganexperimentingwithpay-for-usemodelsthathavebeen less than successful, most content is distributed freely, whichmeansthatmanyorganizationswillcontinuetousebuffering,as itrepresentsaneconomicalmethodtodelivervideo.

Figure 1.1illustratesanexampleofhowpervasivevideohasbecomeontheInternet.Inthisexample,thisauthorwenttoYahooNewsthedayafterthe2009electionresultsoccurred,thatis,onNovember3,2009. In theoriginalwindowunder theheadingPOLITICO,youwillnoteapictureof thegovernor-electofNewJerseyaswellasabuttonlabeledPlayVideo.Clickingonthatbuttonresultsinthedis-playofanadvertisementforCresttoothpastebeforethevideoofthenewsoccurs.Ifyoucarefullyexaminethebackgroundwindowshown

Page 22: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 3

Figu

re 1

.1

View

ing

a Cr

est a

dver

tisem

ent p

rior t

o th

e vi

deo

abou

t the

off-

year

ele

ctio

n be

ing

disp

laye

d.

Page 23: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

4 aPraCtiCalguidetoContentdeliverynetworks

inFigure 1.1youwillnotethattherearethreeadditionalvideosyoucanview froma commonWebpage.Thus, this singleYahoopageallowsviewerstoselectfromfourvideosconcerningarecentelection.

TurningourattentiontotheforegroundwindowwheretheCresttoothpasteadvertisementisshown,notethattheversionofYahooNewsatthetimethescreenwascapturedshowsthelabel“Advertisement”abovethePlaybuttonontheleftofthewindowaswellasthetimeintothevideoanditslengthintherightcornerofthedisplay.Fromthis screen, you can view other videos, which can result in a largeamountoftrafficbeingdirectedtowardyourcomputer.Ifyouwereaccessingasite thatdidnotuseacontentdeliverynetwork(CDN)ordidnothavedistributedservers,thenapersonaccessingthevideowouldobtainadatastreamfromtheservertotheclientthatcouldbe at opposites ends of the globe. This in turn would significantlyaffect theflowofdata through the Internet. In comparison, if youwereaccessingavideofromanorganizationthateitherusedacontentdeliverynetworkormaintaineddistributedservers,yourrequestforviewingthevideomightonlytraveltoanearbycityorserverfarm,wherethevideowouldthenflowdirectlytoyourcomputerwithouthaving to traverse a significant number of routers on the Internet.Thus,thishypotheticalexampleillustratestwokeyadvantagesasso-ciatedwiththeuseofacontentdeliverynetworkorthedistributionofservers.First,itmaysignificantlyreducetheresponse-timedelayassociatedwithclient-serverrequests.Secondlyandmostimportantforvideo,itcanreducethelatencyordelayinthedeliveryofpacketscontainingportionsofthevideorequested.

1.1.2 Disadvantages

BecausetheInternetis,ineffect,aworldwidenetwork,aWebpub-lishercanhavehitsthatoriginatefromalmostanywhereontheglobe.Thismeansthatitwouldbeimpracticalforalmostallorganizationstomaintainduplicateserversstrategicallylocatedaroundtheworld.Thusmost,ifnotall,organizationswithalargeWebpresenceneedtorelyuponthird-partycontentdeliverynetwork(CDN)operatorsthathavegloballydistributedin-placenetworkingequipmentwhosefacilitiesareconnected to theInternet.Becausemostorganizationsneedtouseathirdparty,thisresultsinthosedisadvantagesassociated

Page 24: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 5

with theuseofany third-party technologyprovider, includingcostand support as well as the need to ensure that the CDN providercanupdate the Webpublisher’s server changes in a timely mannerthroughoutthecontentdeliverynetworkoperatedbythethirdparty.Last,butfarfromtheleast,youneedtoverifythattheCDNproviderhas locations that, as best as possible, correspond to the locationswhere potential groups or clusters of Web clients that will accesstheorganization’s server reside.For example,persons accessing theNew York TimesmightbeexpectedtoprimarilyresideintheeasternUnitedStates,whiletheonlinereadershipoftheSeattle TimesmightbeprimarilyinthenorthwesternUnitedStates.Thus,onenewspaperwouldmorethanlikelyhaveasignificantlydifferentrequirementforthecharacteristicsofaCDNthantheothernewspaper.

Nowthatwehaveagenerallevelofknowledgeconcerningwhatacontentdeliverynetworkactuallyrepresentsandafewoftheadvan-tages and disadvantages associated with its use, let’s probe moredeeply into this topic. In doing so, we can consider the “history”factor,wherebywecanlearnfromthepastandexaminetheevolutionofdifferentcontentdeliverymethods.

1.2   Evolution

Although there were many technical problems associated with thedevelopmentoftheWorldWideWeb(www)seriesofserverscreatedby academia, businesses, and government agencies, two problemsweredirectlyrelatedtocontentdelivery.BothproblemsresultedfromthegrowthinthenumberofWebserversinwhichthetotalnumberofWebpageswasincreasingatanexponentialrate.

Thefirst problem resulted from the literal infinite availability ofinformation, making it time consuming for individuals to locateinformation on the Internet. While search engines such as Yahoo!andGoogle facilitated the locationofWeb-based information, if apersonrequiredaccesstoaseriesofdatafromdifferentlocations,heorshewouldneedtospendaninordinateamountoftimerepeatingtheWeb-pagelocationprocessesonadailybasis.

ThesecondproblemresultedfromthegloballocationofWebservers.BecauseaclientinChicagocouldrequireaccesstoaserverlocatedinLondon,Webquerieswouldneedtoflowthousandsofmilesthrough

Page 25: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

6 aPraCtiCalguidetoContentdeliverynetworks

aseriesofrouterstoreachtheirdestination.AseachqueryreachedthedestinationserverinLondon,thecomputertherewouldprocesstherequestandreturnanapplicableWebpage.ThatWebpagewouldflowinareversemannertothequery,throughaseriesofroutersbacktotheclient.Becauseroutersneedasmallperiodoftimetoexaminetheheaderofeachdatapacketasamechanismtodeterminewheretosendthepacket,networkdelaysareprimarilyaresultofthenumberofrouterhopsbetweensourceanddestinationcomputersaswellasthetrafficflowingthroughtherouter.Thus,asthenumberofhopsbetweenclientandserverincrease,sodotheexpecteddelaysbetweentheclientqueryandserverresponse.

InChapter2wewillobtainanappreciationforthein-depthdetailsofclient-servercomputing.Untilthatchapter,wecansimplynotethattheadditionalflowoftrafficthroughtheInternetandthedelaysduetopacketshavingtotraverseaseriesofroutersareimpedimentsthatadverselyaffecttheflowofdata.Ofthetwoproblemsmentioned,thefirstresultedinthedevelopmentof“push”technology.BecausepushtechnologycontributedtotheflowofdataacrosstheInternetandpre-datedtheestablishmentofcontentdeliverynetworks,wewilldiscusstheirbasicoperationaswellasthedifferenttypesofcontentdelivery.However,prior todoingso, let’s refreshourknowledgebydiscuss-ingthebasicsofclient-servercomputingandnotethatitrepresentsa“pull”technology.

1.2.1 Client-Server Computing

Althoughclient-server computingrepresentsatermwenormallyasso-ciatewiththeintroductionofPCsduringthel980s,inactualityitsorigins date to the mainframe computers that were manufacturedbeginninginthel950sand1960s.Thus,inthissection,wewillbrieflyreview data flow in the client-to-mainframe environment prior toexaminingmodernclient-serveroperations.

1.2.1.1 Client-to-Mainframe Data Flow By the mid-1960s, manymainframe computers had a hierarchical communications structure,usingcontrolunitsthatwerealsoreferredtoasclustercontrollerstogrouptheflowofdatatoandfromanumberofterminaldevicescabledtoeachcontrolunit.Controlunitswereinturneitherdirectlycabledto

Page 26: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 7

achannelonthemainframecomputerwhentheywerelocatedinthesamebuildingasthemainframe,ortheycommunicatedviaaleasedlinetoacommunicationscontrollerwhenthemainframewaslocatedinadifferentbuildingorcity.Thecommunicationscontroller,whichwasalsoreferredtoasafront-endprocessor,specializedinperformingserial-to-parallelandparallel-to-serialdataconversionaswellasothercommunications-relatedtasks, ineffectoff-loadingamajorityofthecommunicationsfunctionspreviouslyperformedbythemainframe.

In fact, when this author commenced his long career associatedwithcomputers,heinitiallyworkedatalocationwhereclustercon-trollers were connected to terminals in a suburban Washington,D.C.,location,whereasthemainframewaslocatedinMinneapolis,Minnesota.Asyoumightexpect,theroomwheretheterminalswereclusteredwas called the terminal room.Allwasfineuntil onedaysomeoneansweredtheringingofthetelephonebysaying“terminalroom”intotheheadpiece,receivingtheresponse“OhmyGod”andahang-up.Uponabriefinvestigation,itwasdeterminedthattheemer-gency room of the local hospital was assigned a telephone numberonedigitawayfromthetelephonenumberassignedtothetelephoneinourterminalroom.Toensurethatthissituationwasnotrepeated,wehadthelocalphonecompanychangethetelephonenumberofthephoneintheterminalroom.Whilethisstoryhasnothingtodowithcluster controllers nordirectlywith terminals, it does illustrate thefactthatyouneedtoconsiderthattheunusualmayoccurandbuildsomeslackintoyourtimelineforcontingencies.

Aspreviouslynoted,theuseofclustercontrollersconnectedtoafront-endprocessororcommunicationscontrollerenabledthearchi-tectureofthemainframecomputertobebetterdesignedformovingbytes and performing calculations. This in turn allowed the main-frametoprocessbusinessandscientificapplicationsmoreefficiently,sincethecommunicationscontrollerwasdesignedtoprocessbitsmoreefficiently,whichrepresentsamajorportionoftheeffortrequiredwhentaking parallel-formed characters and transferring them bit-by-bitonto a serial line or, conversely, receiving a serial data stream andconvertingthedatastreambit-by-bitintoaparallel-formedcharacterthatthemainframewouldoperateupon.

Figure  1.2 illustrates the hierarchical structure of a mainframecomputer-based network. If you think of the terminal operators

Page 27: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

8 aPraCtiCalguidetoContentdeliverynetworks

as clients and the mainframe as a server, since it provides accessto programs in a manner similar to a modern-day server, then theterminal-to-mainframeconnectioncouldbeconsideredtorepresentanelementaryformofclient-servercomputing.

In the terminal-to-mainframe environment, the terminal func-tionsasaclientrequestingaparticularservice.Themainframe,whichfunctionsasaserver,representstheprovideroftheservice.Becausethefirstgenerationofterminaldeviceshadalimitedamountofintel-ligence,itwasn’tuntilthesecondandthirdgenerationofterminalsappeared on the market during the later portion of the 1970s thatterminalsweredesignedtoperformwhatwewouldnowrefertoaslimitedfunctions,butwhichat that timewere ineffecthard-wiredelectronics.Thosehard-wiredelectronicsrepresentedanearlyversionoffirmwareandincludedasequenceofoperationsthatcouldbecon-sideredtorepresentaprogram.Thus,thelatergenerationsofterminaldevicesweremorerepresentativeofclient-servercomputingthanthefirstgenerationofterminalsthatsimplydisplayeddataandprovidedalimiteddata-entrycapability.

Oneofthekeylimitationsofthemainframeenvironmentwasitsoriginal hierarchical structure. That is, all communications had to

Mainframe

Front end processor

Controlunit

Controlunit

Legend: Terminals

Figure 1.2 Terminal access to mainframes can be considered to represent an elementary form of client-server computing.

Page 28: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 9

flowupwardthroughthemainframe,eveniftwoterminaloperatorssatnexttooneanotheranddesiredtoexchangeinformationelectron-ically.This couldcause significantdelays, especially if the terminaloperatorswerelocatedin,forinstance,acityontheEastCoastandthecomputerwaslocatedintheWest.Thiswasespeciallytrueduringthe1970s,whenahigh-speedcommunicationscircuitprovidedadatatransmission rate of 4800 or 9600 bps. Although IBM attemptedto change the hierarchical architecture of mainframes through itsAdvanced Peer-to-Peer Networking (APPN) hardware and soft-wareintroducedduringthel990s,bythenlocalareanetworkingwasdominanttoincludemodernclient-servercommunications.

1.2.1.2 Modern Client-Server Operations Although we previouslynotedthattherelationshipbetweenterminalsandmainframescouldbeconsideredtorepresentanelementaryformofclient-servercomput-ing,itwasn’tuntiltheintroductionofthepersonalcomputerduringtheearly1980sthatthistermgainedacceptance.Theprogrammabilityofpersonalcomputersformedthebasisforthedevelopmentofmodernclient-server computing to include Web-based applications whoseoperationsareimprovedbytheuseofacontentdeliverynetwork.

In a modern client-server computing environment, the clientrepresentsaprocess(program)thattransmitsamessagetoaserverprocess(program)overacommunicationsnetwork.Theclientprocessrequeststheservertoperformaparticulartaskorservice.Theclientoperatesaprogramthatnormallymanagestheuser-interfaceportionof server programs, although it may also perform other functions.In aWebenvironment,thePCtransmitsUniformResourceLocator(URL) addresses indicating the address of information the clientwishestoreceive.TheserverrespondswithWebpagesthatincludecodesthatdefinehowinformationonthepagesshouldbedisplayed.Theclientprogram,whichisthemodern-daybrowser,decipherstheembeddedcodestogenerateWebpagesontheclient’sdisplay.

Figure  1.3 illustrates the relationship of several aspects ofmodern-dayclient-servercomputing inaWebenvironment. Intheupper-rightcornerofthemainwindowoftheMicrosoftInternetExplorerbrowser program, you will note the address http://www.yahoo.com.ThisaddressrepresentstheURLtransmittedtotheYahooserverfromthe author’s clientPC. In response to this query, theYahoo server

Page 29: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

10 aPraCtiCalguidetoContentdeliverynetworks

transmitted its homepage that corresponds to theURLaddress itreceived. Inactuality, theclientPCreceivesa sequenceofHTML(HyperTextMarkupLanguage)andJAVAorVBScriptstatements,aportionofwhichareillustratedinthewindowintheforegroundthatislocatedintheupper-leftportionofFigure 1.3.ByselectingSourcefromthebrowser’sViewmenu,thesourcestatementsthatwereinter-pretedbythePCoperatingthebrowseraredisplayedinaNotepadwindow.Thus,inthissingledisplay,youcanviewtheclientrequestin the form of the URL transmitted to the server and the server’sresponsethat,wheninterpretedbythebrowseroperatingontheclient,generatedtheWeb-pagedisplay.InChapter2wewilldescribeanddiscussinconsiderabledetailHTML,includingembeddedprogramscodedinJavaandVBScript.

BecausetheInternetanditsassortedfacilities,suchasWebservers,arerecentadditionstothehistoryofclient-servercomputing,itshouldbenotedthatserversperformotherfunctionsbesidesgeneratingWebpages.Someserversexecutedatabaseretrievalsandupdateoperations.Otherserversprovideaccesstoprinters,whileaserver-basedprocesscould operate on a machine that provides network users access to

Figure 1.3 The relationship between an URL query, HTML code response, and client display of a requested Web page.

Page 30: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 11

sharedfiles,resultinginadevicereferredtoasafileserver.Overtheyears,anumberofdifferenttypesofserversweredevelopedtosupportavarietyofapplications.Whileadatabaseserverrepresentsonecom-montypeofserver,whenthedatabaseisincorporatedintoareserva-tionsystem,theresultisaspecifictypeofreservation-systemserver,suchasacarrental,hotel,orairlinereservationsystemserver.Anotherpopulartypeofserverthathasrecentlyemergedisthevideoserver.

Figure 1.4illustratesagenericinformation-deliverysystemwherethenetworkcouldbetheInternet,acorporateintranet,orsimplyalocalareanetwork(LAN)withoutaconnectiontoanothernetwork.

1.2.2 Use of Video Servers

As its name implies, a video server is used to store video content.To illustratewhy the storage capacityof a video server is extremelyimportant, let’s assume that an organization is storing “How To”videos coveringfixingafaucet, installinga lightswitch,andsimilarprojects. If thevideohasa framerateof30 framespersecond(fps)and if a resolution of 640 × 480 pixels is used with a 24-bit colordepth,thena1-seconddisplayrequiresthestorageandtransmissionof30×640×480×24orapproximately221millionbits,whichisalmost28millionbytes.Thus,a1-minutevideowouldrequirethestorageandtransmissionofapproximately1.68Gbytesofdata.InamodernvideoenvironmentsuchastheWeb,thereareseveralmethodsthatcanbeusedtoreduceboththestorageandtransmissionofvideodata.Themajormethodsusedtoreducethesizeofvideofilesincludereducingthelengthofthevideo;alteringtheresolution,framerate,andcolordepthofthevideo;aswellastheuseofcompressiontechnology.

1.2.2.1 Video Length Obviously,thelongeravideo,themorestorageisrequiredaswellasadditionaltimeforitstransmission.Thus,one

Clients

Network Server

Information

Figure 1.4 A generic information-delivery system.

Page 31: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

12 aPraCtiCalguidetoContentdeliverynetworks

obviousmethodthatcanbeusedtoreducethedatastorageandtrans-missionrequirementsofavideoistoreduceitslength.Thisiswhy,inafast-changingworld,theon-linenewsvideosarenormallyonlyafewminutesinlength.

1.2.2.2 Video Resolution Full-resolution video dates to the VGAmonitor, which had a resolution of 640 × 480 pixels, or 640 pixeldotsacrossby480linesofpixels.Althoughtherehavebeensignifi-cantimprovementsinthecapabilitiesofadaptercardsandmonitorswith respect to resolution since the VGA standard was developed,full-resolution video is commonly used today. Other commonlyencounteredresolutionsinclude320×240, inwhichthewidthandheightarereducedbyafactoroftwo(whichreducesdatastoragebyafactoroffour),and160×120(whichisreferredtoasquarter-resolutionandwhichreducesthetotalamountofdatabyafactorof16).

1.2.2.3 Frame Rate In the movie theater, the frame rate of a filmistypically24framespersecond.OnaTV,duetotherefreshrate,imagesaredisplayedatarateof30framespersecond.Aslowerframeratecanresultinimageshavingajumpinessquality.Inthedistribu-tionofimagestocomputers,theinitialuseofvideooccurredatframeratesaslowas15andeven10fpstoreducedatarequirementsbyafactoroftwoorthree.Whilethiswasalmostanecessityadecadeago,advancesindatastoragecapacityaswellasthedatarateofcableandDSLcommunicationsallowsa30-fpsrate.

1.2.2.4 Color Depth Colordepthreferstothenumberofpixelsusedto represent thecolorofeachpixeldisplayed.Theuseofblackandwhiterequiresonebitperpixel,whilewhatisreferredtoas“true color”and“fullcolor”uses8bitseachofred,blue,andgreendata;requires24bitsperpixel;andrepresentsthemaximumamountofcolorthehumaneyecandistinguish.Thenext-smallerreductionresultsintheuseof16bits,whichcanrepresentthousandsofcolorsandprovidesaone-thirdreductioninthequantityofdata.Becausevideoistypi-callyshotusinga24-bitcolor-depthdevice,trimmingthecolordepthfrom24to16bits,whichisreferredtoasquantization,candistorttransitionswithinframes.

Page 32: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 13

1.2.2.5 Data Compression Ofallthetechniquespreviouslymentioned,data compression provides the largest reduction in the quantity ofdatausedtorepresentavideo.Forfull-motionvideo,thereareseveraldata-compressionmethodsthatareused.Afewofthosecompressionmethods include Windows Media Video (WMV), which is basedupon Microsoft’s implementation of the Moving Picture ExpertsGroup(MPEG)4,Part2;andAdobe’sFlashVideo,whichinversions6and7usedtheSorensonSpark,acodec(coder-decoder) forFLVfiles,whileFlashPlayer8andnewerrevisionsnowsupporttheuseof ON2 Technologies VP6 compression as well as various versionsofMPEG.Whileeachversionofdatacompressionperformsdiffer-entlywithrespecttoitssupportoflow-,medium-,andhigh-bit-ratetransfersandcompressionefficiency,incommon,theyallconsiderablyreducethesizeofdatastoragerequirementsaswellasdatatransmis-siontime.However,toeffectivelyviewavideo,theclientmusthaveeitheracompatiblebrowseroracompatibleplug-in inthebrowser,twotopicswewilldiscussinmoredetaillaterinthisbook.

1.2.3 Server Network Architecture

Similar to network architectures developed to expedite the flow ofdata, theuseof servers resulted inanarchitecturebeingdevelopedtofacilitateprocessing.Initially,PCnetworkswerebasedupontheuseoffileserversthatprovidedclientswiththeabilitytoaccessandsharefiles.Becausethefileserverwouldrespondtoaclientrequestbydownloadingacompletefile,theserver’sabilitytosupportmanysimultaneoususerswaslimited.ThislimitationstillexiststodaywithWebserversandhasbeenexploitedthroughtheuseofdenialofservice(DOS) attacks, in which the nature of the Transmission ControlProtocol(TCP)anditsthree-wayhandshakepermitsaDOSattacktooccur.Inadditiontobeingsubjectedtopotentialattacks,networktrafficoccurringasaresultofmanyfiletransferscouldsignificantlyaffectcommunications.

The limitations associated with file sharing resulted in the useof database servers. Employing a relational database managementsystem(DBMS),clientqueriesweredirectlyanswered,considerablyreducingnetworktrafficincomparisontotheuseofnetworkswhenatotalfile-transferactivityoccurred.Thus,themodernclient-server

Page 33: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

14 aPraCtiCalguidetoContentdeliverynetworks

architectureinmanyorganizationsresultsfromtheuseofadatabaseserverinsteadofafileserver.

There are two types of server architecture one must consider toexpeditetheflowofdata.Thosetypesofarchitectureareinternalandexternaltotheserver.Whenwediscussinternalserverarchitecture,we primarily reference the arrangement and specifications of thecomponentswithintheserver,includingprocessors,processorpower,memory,channelstodisk,diskcapacity,anddiskinput/output(I/O)data transfer rates. When we discuss external server architecture,weprimarilyreferencethesubdivisionofeffortbyhavingoneserverfunction as a preprocessor for another. Because this type of serverrelationshipcommonlyoccursoveranetwork, it’s commonto referto this architecture as a server-based network architecture. In thissection,wewillfocusourattentionuponexternalorservernetworkarchitecture.Thetwomostcommonformsofserverarchitecturesaretwo-tierandthree-tier.

1.2.3.1 Two-Tier Architecture A two-tier architecture represents thedirectcommunicationsbetweenaclientandserver,withnointerven-ing serverbeingnecessary.Here, the client representsone tier andtheserverrepresentsthesecondtier.Thisarchitectureiscommonlyused by small- to medium-sized organizations, where the serverneedstosupportuptoapproximately100users.It’simportanttonotethatwhenwediscussuserlevels,wearereferringtothenumberofsimultaneoususersandnotthepopulationofpotentialusers,whichisnormallyconsiderablylarger.

1.2.3.2 Three-Tier Architecture In a three-tier server architecture, aserverorseriesofserversfunctionasagentsbetweentheclientandserverwherethedataorapplicationtheyrequireresides.Theagentscanperformanumberoffunctionsthatoff-loadprocessingthatother-wisewouldberequiredtobeperformedbytheserver.Forexample,agents could provide a translation service by placing client queriesintoadatabaseretrievallanguageforexecutiononadatabaseserver.Otherpossibleagentfunctionscouldrangeinscopefromfunctioningasameteringdevicethatlimitsthenumberofsimultaneousrequestsallowedtoflowtoaserver,tofunctioningasapreprocessormappingagent,whererequestsaredistributedtodifferentserversbasedupon

Page 34: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 15

certaincriteria.Stillotherexamplesoffunctionsthatcouldbeper-formed by middle-tier devices include queuing of requests, requestfiltering,andavarietyofapplicationpreprocessingthatisonlylimitedbyone’simagination.

Figure  1.5 illustrates two-tier and three-tier architectures on acommonnetworkforsimplicityofexplanation.Inthetwo-tierarchi-tecture,notethatclient-servercommunicationsflowdirectlybetweendevices. In a three-tier architecture, data flow occurs twice on thenetwork,firstfromtheclienttotheagentandthenfromtheagenttotheserver.Thus, ina localareanetworkingenvironment,theaddi-tionalnetworkutilizationneedstobecomparedwiththeoff-loadingofprocessingfunctionsontheserver.Forexample,assumethatthedataflowaverages6000packetspersecond(pps),witheachpackethavinganaverageof1200bytesofdata.Then,atwo-tieredarchitec-turewouldhaveadataflowof72Mbps,whileathree-tieredserverarchitecturewouldresultinasituationwherethedoublingofthedataflowwouldresultinadatarateof144Mbps.Whilethisisinsignifi-cantifaGigabitEthernetnetworkisinstalled,onewouldgofromanoverusednetworkthatcouldnothandlethedataflowifFastEthernet,which has a maximum data-transfer capability of 100 Mbps, wasemployed.Notethatbecausetheagentcouldperformpreprocessingondata, it’spossible that thedataflowmightbe lesswhenflowing

(a) Two-tier

Server

Server

Agent

(b) ree-tier

Figure 1.5 Two-tier versus three-tier architecture.

Page 35: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

16 aPraCtiCalguidetoContentdeliverynetworks

totheserver.However,it’salsopossiblethatthepreprocessingcouldresultinadditionaltraffic.Regardlessoftheactionoftheagent,therewillbemoretrafficonathree-tiernetworkthanatwo-tiernetwork,andyouneedtoconsiderthiswhendesigningyournetworkstructure.

Now thatwehave an appreciation for thebasics of client-serverarchitecture,let’sreturntoourdiscussionoftheevolutionofcontentdeliveryandfocusourattentionuponpushtechnology.

1.2.4 The Road to Push Technology

Pushtechnologyisnormallythoughtofasanewmodelofinforma-tiondistributionanddataretrieval.Inactuality,earlyversionsofpushsystems occurred during the l970s and early 1980s in the form ofteletextsystemsandbroadcastdeliveryvideotext.

1.2.4.1 Teletext Systems Teletextsystemsbegantobecomepopularinthelate1970s,especiallyinEurope.Informationpagesaretransmit-tedintheverticalblankinginterval(VBI)oftelevisionsignals,withdecodersbuiltintoTVsetsbecomingcapableofcapturing,decoding,anddisplaying “pages” selectedby the consumer.AsmanypersonswhotraveltoEuropeprobablyremember,mosttelevisionsinhotelsincludea teletext capability thatenablesapersonholdinga remotecontroltoviewtheweather,TVschedule,andotherinformation.

AsillustratedinFigure 1.6,ateletextsystemtransmitsasequenceof information intheformofpages thatarerepeatedatpredefinedintervals.Thepersonwiththeremotecontrolentersarequest,whichistransmittedtothetelevisionandresultsinthedisplayofthedesiredinformationoncetherequestisprocessedandtherequestedinforma-tionisreturnedviathebroadcaststream.

Because teletext operators noted that certain pages of informa-tionweremoredesirablethanotherpages,theyalteredthesequenceorcycleofpagetransmissionsfromthepuresequenceillustratedinFigure 1.6.Forexample,supposethatthereareonlythreepagesof

TV VBI data stream in the form of ‘pages’ of information

1 2 3 n• • • 1 2

Figure 1.6 Teletext system operation.

Page 36: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 17

teletextinformation,anditwasfoundthattheinformationinpage1couldberequested50%ofthetime,whiletheinformationonpages2and3mightbedesired30%and20%ofthetime,respectively.Thenanoptimumteletextcyclecouldappear,asillustratedinFigure 1.7.

1.2.4.2 Videotext While we can consider teletext as an elementaryformofpushandselecttechnology,itwasvideotextthataddedblock-formedpictures,suchthatthetechnologycanactuallybeconsideredasthefatherofmodernWeb-basedtechnology.Originally,videotextsystemsdeliveredinformationandtransactional-orientedservicesforbanking,insurance,andshoppingservices.Later,videotexttechnologywasusedbynewspaperpublishers,whomadenewsandadvertisementsavailablethroughspecialterminalshookeduptotelevisionmonitors.However, it wasn’t until teletext systems were developed for opera-tionwiththegrowingbaseofpersonalcomputersthatteletext,withblock characters thatprovidedwhatwewould consider to representelementarygraphicstoday,becamewidelyavailable.Bythemid-1990s,priortotheavailabilityoftheWeb,videotextsystemswereoperatedbyAmericaOnline,CompuServe,Prodigy,andGenie.In fact, thisauthorrememberscountlessnightshiswifespentonthePCconnectedtoCompuServe andProdigy, communicatingwithotherpatronsofthetheaterwhilenavigatingthroughvarioussystemmenusdisplayingwhatwouldnowbeconsideredascrudegraphics.Toobtainamoreworldlyviewofteletext,wecanconsidertheuseofterminalstoreplacetheweightyphonedirectoryaswellassavenumeroustreesfrombeingmadeintopaper.InFrance,MinitelterminalsweredistributedbytheFrench government in place of telephone directories, and videotextwasandstillisinpopularuseinthatcountry.

1.2.5 Pull Technology

Untiltheintroductionofpushtechnology,informationretrievalwasbaseduponwhat is referredtoaspull technology.That is,aperson

11 3 1 2 1 3 1 2 • • •• • • 2

Figure 1.7 An example of a possible optimal teletext cycle where pages 1, 2, and 3 occur 50%, 30%, and 20% of the time, respectively.

Page 37: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

18 aPraCtiCalguidetoContentdeliverynetworks

would either have previously located an item of interest or used asearchengine,suchasGoogleorYahoo!,tolocateanitemofinterest.Onceanitemofinterestwaslocated,theconsumer,actingasaclientoperator,wouldusehisorherbrowsertopointtotheURLonaservertoretrievetheinformationofinterest.

1.2.5.1 Role of Caching Onepopularmethoddevelopedtofacilitateclient-serverpulloperations is caching. Inabrowser environment,which most readers are familiar with, caching occurs through thetemporarystorageofInternetfilesinapredefinedfolderonyourharddrive. The stored files represent previously visited Web pages andfiles,suchasgraphicsdisplayedonaparticularpage.

InaMicrosoftExplorerbrowserenvironment,temporaryInternetfilesareplaced inapredefined folder locatedat:C:DocumentsandSettings\Owner\LocalSettings\TemporaryInternetFiles\.Abrowseruser can go to Tools> General> Settings in Microsoft’s InternetExplorertoviewcachedfiles,adjusttheamountofdiskspacetouseforcaching,changethelocationofthecachefolder,aswellasdefinehowcachingoccurs.

Figure  1.8 illustrates the Settings Window selected from Tools>General>Settingsonthisauthor’scomputerwhenheusedMicrosoft’sInternetExplorer5.Notethat,whenyouusedthatbrowser,youwould

Figure 1.8 In Microsoft’s Internet Explorer Version 5, you can use Tools> General> Settings to control the use of caching.

Page 38: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 19

haveseveraloptionsconcerningthemannerbywhichcachingupdatesoccur.Youcanhave thebrowser check fornewer versionsof storedpagesoneveryvisittothepage,everytimeyoustartInternetExplorer,automatically(browserdefault),ornever.Inaddition,youcanadjusttheamountofspaceusedforcaching.Whileincreasingtheamountofdiskspacecanincreasehowfastpreviouslyvisitedpagesaredisplayed,italsoresultsinadditionalprocessing,sinceadditionalpagesneedtobesearched.Inaddition,anincreaseddiskcacheobviouslydecreasestheamountofspaceavailableforotherfilesonthecomputer.

ThenewerversionsofInternetExplorerareconspicuousbytheirabsenceofanoptionfordirectlycontrollingcache.Forexample, inInternetExplorer8,ifyougototheToolsmenuandselectInternetOptions,thedialogboxwillconsiderablydifferfromthatshowninFigure  1.8 and will not provide any options for the controlling ofcache.However,ifyougototheToolsmenuandthenselectDeveloperToolsorpresstheF12key,theresultingdialogboxwillprovideyouwiththeabilitytocleartheBrowsercacheorclearitforthedomain.ThisisillustratedinFigure 1.9.

If you use a different browser, your ability to control cache willdepend upon both the browser used and its version. For example,thisauthoralsousesthepopularMozillaFirefoxbrowser,withthelatest versionbeing version3.6.3when this book revisionwasper-formed. This version of the Firefox browser provides the ability todirectlycontrolcachesimilartoearlierversionsofInternetExplorer.Forexample,goingtotheToolsmenu,selectingOptions,andthenselectingAdvancedresultsinthedisplayofadialogboxwithaseriesoffourtabs.SelectingthetablabeledNetworkresultsinthedisplayofthecurrentsettingforcache.Inaddition,thedialogboxprovidesyouwiththeabilitytoresetthevalueshowntoanothervalueifdesired.Figure 1.10illustrateshowyoucancontrolcacheviatheuseoftheMozillaFirefoxbrowser,assumingyouareoperatingversion3.6.3oranotherversionthatprovidesasimilarcapability.

WhileWebcachingisprobablythemostpopularformofcachinginuse,ithascertainstrengthsandlimitations.Webcachingiseffec-tiveifdocumentsdonotchangeoften,butitbecomeslesseffectiveasdocument changes increase in frequency.Thus, there are severaldeficienciesandlimitationsassociatedwithpulltechnologyandtheuseofcaching.

Page 39: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

20 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 1

.9

Usin

g th

e De

velo

per T

ools

ent

ry fr

om th

e To

ols

Men

u pr

ovid

es tw

o m

echa

nism

s fo

r cle

arin

g th

e ca

che.

Page 40: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 21

Figu

re 1

.10

Cont

rolli

ng th

e ca

che

usin

g th

e M

ozill

a Fi

refo

x bro

wser

.

Page 41: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

22 aPraCtiCalguidetoContentdeliverynetworks

1.2.5.2 Pull Limitations AsanyonewhohasusedtheInternettoobtaininformation on a topic that is evolving knows from experience, toobtain frequentlyupdated information requires theperiodic checkingofa server.Forexample,during2004, informationaboutHurricanesCharlie,Francis,andIvanwereobviouslyofinteresttopersonslivinginthesoutheasternUnitedStates.Underthepullmodel,usersinterestedinhurricaneinformationwouldhavetoperiodicallyquerytheUnitedStatesNationalHurricaneCenterWebsiteorasimilarlocationtoobtainaforecastofthecurrentandprojectedmovementofthestormofinterest.

Toillustratethepreceding,let’sassumethatyouwereinterestedintheprojectionofthemovementofHurricaneIvanasitracedtowardtheUnitedStatesduringSeptember2004.Figure 1.11illustratesthethree-dayprojectionforthemovementofHurricaneIvanasof11 a.m.onSeptember13,2004.Obviously,ifyoulivedintheprojectedpathofthishurricane,youwouldperiodicallyhaveto“pull”anewprojectiontoseeiftheprojectedpathwasaltered.

Fromanetworkviewpoint, therecanbehundredstomillionsofpersonsaccessingthesameinformationoverandover,sincetheyarenotsurewhenanewforecastwillbeavailable.Thismeansthatthe

Figure 1.11 Using pull technology, a client has to periodically retrieve a Web page to determine if an item of interest has changed.

Page 42: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 23

Webservermayhavetoperiodicallyhandleavolumeofconnectionsthat can significantly vary. Depending upon the location of clientswithrespecttotheserveraswellastheamountoftrafficflowingtoclients,latencycanconsiderablyincrease.Thismaynotonlyadverselyaffectpersonsaccessingoneorafewserversbut,inaddition,adverselyaffectotherpersonsrunningdifferentapplications.

Because applications of interest to many persons have no effi-cientmechanismtoavoidduplicateddata,another limitation is thequantityof trafficinapullenvironment.Forexample,supposeoneuser transmits a 256-byte packet representing a URL request to aserverthatelicitsa10,0001024-bytesequenceofpacketresponses.If 1000 usersrequestedthesameinformation,theservermustrespond1000timeswith10,0001024-bytepackets!Inadditiontotheserverrespondingoverandoveragain,thenetworktrafficwouldbeconsid-erable.Forexample, if theserverwas located inMiamiandclientslocated throughout the southeastern United States were accessingtheserver,networktrafficwouldbedirectedthroughouttheInternetto clients located in each southeastern state. While Internet trafficwouldbeperhapsreasonable forclients located inFlorida,networktraffictoGeorgiaresidentswouldfirstflowthroughFloridaandthenthroughGeorgia.Similarly,clientslocatedinSouthCarolinawouldhavetrafficroutedthroughFlorida,thenthroughGeorgiatoarriveinSouthCarolina.ForclientslocatedinNorthCarolina,thetrafficwouldfirstflowthroughFlorida,sothebandwidthcapabilityoftheInternet inFloridawouldbeamajorconstraint toclientsaccessingweatherinformationonaserverlocatedinFlorida.Thus,amethodtoduplicatethecontentoftheMiamiserverinotherstatescouldreducethepossibilityofnetworkbottlenecksandcanbeconsideredanotheradvantageassociatedwithcontentdeliverynetworks.

During the 1980s and early 1990s, several attempts occurred tominimize the bandwidth required for the distribution of popularlyrequestedinformation.Perhapsthemostpopularmethodthatisstillinusetodayismulticasttransmission.

1.2.6 Multicast

MulticastcommunicationrepresentsamethodtoconservebandwidthinaTCP/IPprotocolenvironment.Undermulticast,communication

Page 43: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

24 aPraCtiCalguidetoContentdeliverynetworks

occursbetweenasingledatasourceandmultiplereceiversthathavejoinedamulticastgroup.Toavoidpotentialconfusion,notethatthereare three typesofaddressingthatcanoccuronanetwork:unicast,multicast, and broadcast. A unicast address represents the addressof a distinct entity. In comparison, a broadcast address representsamethod to senddata so that it is receivedbyallmemberson thenetwork.Thus,wecanviewamulticastaddressasrepresentingmorethanoneentityandpossiblyallentitiesonanetworkifeachentityjoinedthemulticastgroup.

Althoughmulticastrepresentsa1990stechnology,itprovidessev-eralimportantadvantagesforgroupcommunicationsincomparisontotraditionalunicastandbroadcasttransmission.

1.2.6.1 Advantages A primary advantage of the use of multicast isthatitenablestheconservationofnetworkbandwidth.Forexample,if10clientsresidingonanetworksubscribetoamulticastvideo,only1 sequenceof packetsflowsonto thenetwork insteadof 10packetsequences.Anotherimportantadvantageofmulticasttransmissionisthe fact that it scales verywell to largeusergroups.That is, if thenumberofclientsonanetworkdesiringtosubscribetothemulticastvideo increases to100or even1000, the same sequenceofpacketswouldflowonthenetwork.

1.2.6.2 Addresses In a TCP/IP environment, the range of IPv4addressesfrom224.0.0.0thru239.255.255.255isreservedformulti-casting. Those addresses are also referred to as Class D addresses.EveryIPdatagramwhosedestinationaddresscommenceswith thefirst fourbits set to1110 represents an IPmulticastdatagram.Theremaining28bits intheaddress identifythemulticast“group”thatthedatagramissentto.BecausemulticastaddressesrepresentagroupofIPdevices,theycanonlybeusedasthedestinationofadatagram;hence,theyareneverthesource.

InanIPv4environment,the28bitsaftertheleading“1110”intheIPaddressdefinethemulticast group address.ThesizeoftheClass Dmulticast address space is therefore 228 or 268,435,456 multicastgroups,ofwhichcertainportionsoftheaddressspacearesetasideforspecificuses.Table 1.1illustratesthegeneralallocationoftheClass Daddressspace.

Page 44: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 25

WithintheClassDaddressblock,therangeofaddressesbetween224.0.0.0 and 224.0.0.255, which are considered as “well-known”multicastaddresses,arereservedforusebyroutingprotocols,topologydiscoveryprotocols,andmaintenanceprotocols.Withinthisaddressrangeareseveralwell-knownmulticastgroups.Twoexamplesofsuchgroupsinclude:

224.0.0.1istheall-hostsgroup:IfyoupingthisClassDaddress,allmulticast-capablehostsonthenetworkwillrespond.

224.0.0.2istheall-routersgroup:Allmulticastroutersmustjointhatgrouponallitsmulticast-capableinterfaces.

ThemajorityoftheClassDaddressspaceisinthemiddlemulticastrange,whichcontainsInternet-widemulticastaddresses.Theyaresimi-lartoClassA,B,andCunicastaddresses,whichwewilldiscussindetailinChapter3coveringTCP/IP,andcanbeassignedtovariousgroups.

The lastaddress rangeshown inTable 1.1 is for localmulticastaddresses, more technically referred to as administratively scopedmulticastgroups.Thisaddressgrouprepresents1/16thofthetotalmulticast address space and is subdivided into site-localmulticastaddresses as well as organization-local addresses and other localmulticastaddresses.

1.2.6.3 Limitations Oneof themajordisadvantages ofmulticast isthe fact thatusersmust register to receive amulticastdata stream,which makes this technology more suitable for predefined events,suchasvideotelecasts,ratherthanpullingdifferentinformationoffseveralWebservers.Anotherlimitationofmulticastisthefactthatnotallroutersorhostsconformtothemulticastspecification.Infact,there are three levels of conformancewith respect to themulticast

Table 1.1 IP Multicast Address Ranges and Utilization

ADDRESS RAnGE BEGInnInG

ADDRESS RAnGE EnDInG UTILIzATIon

224.0.0.0 224.0.0.255 Reserved for the use of special “well-known” multicast addresses

224.0.1.0 238.255.255.255 Internet-wide multicast addresses referred to as “globally scoped” addresses

239.0.0.0 239.255.255.255 Local multicast addresses referred to as “administratively scoped” addresses

Page 45: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

26 aPraCtiCalguidetoContentdeliverynetworks

specification, with level 0 indicating no support, level 1 indicatingsupport for sending but not receiving multicast datagrams, whilelevel 2indicatesfullsupportforIPmulticast.

1.2.7 Push Technology

Thedevelopmentofpushtechnologywasaresponsetotheneedtobring information of interest to consumers instead of having themretrievedata.Underpushtechnology,aconsumersignsupwithapushprovidertoreceivecertaindatathatistransmittedovertheInternettotheuser’sdesktop.Informationofinterestisdisplayedeitherasastreamonthe25thlineofthescreen,orperhapsinaseparatewindowonthedisplay.Theactualdatasenttoauser’sdesktopcancomefromavarietyofWebsites,dependingupontheinformationofinteresttotheuser.Whentheuserfillsoutaprofileofdataofinterestwiththepush provider, that profile functions as a filter. The vendor’s serversearchesavarietyofWebsites,collectinginformationofinteresttoall subscribers;however, ituseseach subscriberprofileasamecha-nismtodeterminethedatatopushtoindividualsubscribers.Becauseselectedinformationofinterestisbroadcastsimilartoatelevisionorradiobroadcast,thetermsstreaming,channeling,andbroadcastingaresometimesusedassynonyms.

According to somepersons, the term push alsoderives from thetermpush polling,whichwasusedduringthe1996U.S.presidentialelection,whenunscrupulouspollingpersonnelpretendedtoconductatelephoneopinionpollbut,inactuality,usedquestionsthat“pushed”theircandidate’sstrengths.

1.2.7.1 Evolution In a modern push-technology environment,PointCastrepresentsthevanguardofaseriesofcompaniesthatdevel-opedpush technology. Founded in 1992 todeliver news andotherinformationovertheInternet,during1996thecompanydistributedapproximately1.6millioncopiesofitsproprietaryWebcastingsoft-ware,whichforitstimeperiodrepresentedatremendousachievement.During1997,thehypeassociatedwithPushtechnologyresultedinBusiness Week noting PointCast in its cover story on Webcasting,includingaquotationfromitsthen-34-year-oldCEO,ChrisHasset,that“wearedefininganewmedium.”

Page 46: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 27

PointCast’srunatfamewasmomentary.AlthoughPointCastpro-videdcustomerswiththeabilitytoreceiveadvertisementsalongwithspecificrequestedinformation,pushtechnologywasliterallyshovedoffcorporatenetworksbyorganizationswhofoundthatemployeeuseof thePointCast systemwasclogging their Internetconnectionsaswellas the localareanetworksconnected to themotherofallnet-works. As more and more companies shunned the technology, thefortunes of PointCast considerably diminished. In addition, com-petition from Yahoo! and other Internet-related commercial firmsresultedintheuseofpushtechnologyviatheuseofnonproprietaryInternet channels. Eventually, both Microsoft and Netscape incor-poratedWebcasting technology into theirbrowsers, furtherdimin-ishing theneed forPointCast’s proprietary software.Within a fewyears,PointCast was acquiredby another company, and theuse ofpushtechnologyfromthepioneerceasedmanyyearsago.

The adoption of push technology was valuable for organizationsneedingamechanismtodeliverinformationenmasse,suchasauto-matic updating of business price lists, manuals, inventories, andpolicies.However,themassmarketneveractivelyusedthedesktopchannelbarthatbecameavailableunderInternetExplorerVersion4.0.In fact, themoremodernversionsofWindows—suchasWindowsXP,Vista,andWindows7—donotsupportthedesktopchannelbaroronlysupportitforuserswhoupgradedfromInternetExplorer4.0orWindows98.Instead,mostpersonstodayuseMicrosoft’sWindowsMediaPlayer,RealNetworks’sRealOne,orasimilarprogramtoviewpredefinedaudioandvideo.Youcanuseeitherprogramtoretrievenews;toplay,burn,andripaudiotracks;andtofindavarietyofcon-tent on the Internet. Nevertheless, these modern Webcasting toolsaremorepullersthanpushers.Inaddition,thesemodernWebcastingtools lack one key feature—a crawl capability—that was built intomanypushproductsandwas included inInternetExplorerVersion4.0.Wewillbrieflydiscussthiscrawlcapabilitybeforemovingontofocusourattentiononmoderncontentdeliverymethods.

1.2.7.2 Crawling UnderInternetcrawling,users specifyeither thegeneral type of information or a specific type of information theydesire. Then, the client program performs a site-crawl of applica-bleWebsites,examiningdataforrelevanceaswellascheckingfor

Page 47: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

28 aPraCtiCalguidetoContentdeliverynetworks

updated content, and thennotifying theuserof changes.UsersofInternet Explorer 4.0 could specify Web crawls up to three levelsdeep from a subscribed Web page as well as retrieve pages fromnonsubscriptionsites.

ToperformacrawlrequiresaspecifiedstartingWebpage.Giventhatpage,abrowsercrawlseachlinkonthatpageand,ifconfiguredto do so, the browser can crawl multiple levels deep or restrict itsoperationtoeachspecifiedpageinalist.Duringthecrawloperation,thebrowserwillchecktoseeifapagechangedbycomparingfilesizeandcreationdateofstoredandaccessfiles.Dependinguponthecon-figurationofthecrawl,onceapageischecked,eitheranotificationofchangesoractualpagecontentisdownloadedtotheuser.Today,Webcrawlingisprimarilyperformedbysearchenginesoruserswitholderbrowsers,asnewerbrowsershavereplacedWebcrawlingwithAtom and Really Simple Syndication (RSS) feeds, which we willshortlydiscuss.

1.2.7.3 Feeds A Feed can be considered to represent frequentlyupdated content that is typically published by a Web site. Somecommon example of Feeds include news and blogger site updates;however,aFeedcanalsobeusedtodistributevideosandaudios,thelattercommonlyintheMP3formatandreferredtoaspodcasting.Thus,aFeedcanbeconsideredasamodernpushtechnology.

ThemostcommonFeed is theReallySimpleSyndication (RSS)feed;however,similartomostsoftware,therearedifferentversionsoffRSS,suchasRSS1.0andRSS2.0.VersionsofRSSaswellasotherFeedsarebasedupontheExtensibleMarkupLanguage(XML),whichisatest-basedcomputerlanguageemployedtodevelopstruc-tureddocuments.

IfyouareusingamodernbrowserwithaFeedscapability,whenyouvisitaWebpage,youwillnoteaFeedsbuttonthat,whenclickedupon,allowsyoutoselectoneormoreFeeds.OnsomeWebsites,theFeedsbuttonisprominentlydisplayed,whileonothersitesabitofsearch-ingmayberequired.Forexample,ontheNew York TimesWeb site(www.nytimes.com),youhavetoscrolltothebottomofthehomepagetofindtheFeedsbutton.ThisisillustratedinFigure 1.12,whenthisauthorscrolleddownthehomepageofthenewspapertoobservetheFeedsbuttonandtheacronymRSStotherightofthebutton.

Page 48: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 29

Figu

re 1

.12

To s

elec

t the

Fee

ds b

utto

n on

the

hom

e pa

ge o

f the

New

Yor

k Tim

es, y

ou n

eed

to s

crol

l to

the

botto

m o

f the

pag

e.

Page 49: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

30 aPraCtiCalguidetoContentdeliverynetworks

OnceyouselecttheFeedsbutton,theWebsiteshowninFigure 1.12willprovideyouwithalistofFeedsyoucanselectfrom.Forexam-ple,at thetimethisauthorviewedthefeedsavailable forselection,theyvariedfromtheHomePagetoWorld,U.S.,Regional,Business,Technology,andmanyotherpotential selections.Figure 1.13 illus-tratesthepotentialfeedsavailableforuserselectionwhenthisauthorbrowsedtheNew York TimesWebsite.Justasamatterofconjecture,perhaps providing a wealth of free information has resulted in thefinancialplightoftheprintmedia.

1.2.7.4 Advantages Similar to most technologies, there are severaladvantagesassociatedwithpushtechnology.Fromtheviewpointoftheclient,pushtechnologypermitspredefinedrequeststhatenablesitemsof interest to be received when such items have Web-page changes.From theviewpointof the server,push technology enables commonmultiple requests to result in a single response to thepushprovider,whichinturnbecomesresponsibleforthetransmissionofWebpagestoindividualclientsubscribers.Duetothis,pushtechnologyispotentiallyextremelyscalableandcanreducetheloadonbothdistantserversandnetworks,whichinturncanbeexpectedtoenhanceresponsetime.

1.2.7.5 Disadvantages Previously,wenotedthatthemainreasonwhyearly versionsofpush technology failed to liveup to thehypewasbecauseitsusewascloggingcorporatenetworks.Today,thereplace-ment of 10-Mbps Ethernet LANs by 1-Gbps and even 10-GbpsLANshasallowedmodernversionsofpushtechnologyintheformofRSSandotherfeedstobeallowedontomanycorporatenetworks.

Besidescloggingcorporatenetworks,twoadditionallimitations—associatedwithearlyversionsofpushtechnologyanditsmoremodernreplacementbyfeeds—arethecommonneedsforclientstosubscribetoaserviceandcontrol.Concerningsubscriptionfees,mostInternetusersrightlyorwronglyviewtheInternetasafreeserviceandalwayswantcheaporno-costinformation.Concerningcontrol,manyusersprefer on-demand information retrieval (pull), where they controlinformationtobedownloaded.Becausepushtechnologywasdevel-oped in an era where high-speed LANs operated at 10 Mbps, thetrafficresultingfrompushhadasignificanteffectuponcorporatenet-works.Whencombinedwith the subscriptionnatureofmost early

Page 50: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 31

Figu

re 1

.13

View

ing

the

feed

s av

aila

ble

from

the

New

York

Tim

es W

eb s

ite.

Page 51: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

32 aPraCtiCalguidetoContentdeliverynetworks

versions of push operators as well as user preference for control ofWeb-pageretrieval,it’sawonderthatpushtechnologyheldtheinter-estofInternetusersforseveralyearsduringthe1990s.

Withthedevelopmentofhigherspeednetworkingcapabilityandthe removal of subscription fees, push technology has been revi-talized.Today,manypersonssubscribetobusiness-relatedfeedstofindoutinformationaboutfavoredinvestments,whileotherpersonssubscribetocertaintypesofnewstobeawareofthelatestevents.

Nowthatwehaveagenerallevelofappreciationfortheevolutionofavarietyofinformation-retrievaltechniquesthatweredevelopedoverthepasthalfcentury, let’sexaminetheroleofmoderncontentdeliverynetworkingandhowitfacilitatesmanytypesofclient-servercommunications,includingpush,pull,andWebcrawling.

1.3   Content Delivery Networking

Inthefirstsectionofthischapter,wewerebrieflyintroducedtocon-tent delivery networking (CDN) through an abbreviated definitionofwhatthetermmeans.Thereafter,wefocusedourattentiononthesecondsectionofthischapter—theevolutionofclient-servertechnol-ogy,includingcachingaswellaspull,push,andcrawlingoperations.Inthissection,wewilluseourpriorbaseofknowledgetoobtainanappreciation for the manner by which content delivery networkingfacilitates thevarious typesofclient-serveroperations.Recognitionofthebenefitsobtainedfromtheuseofacontentdeliverynetworkrequiressomeknowledgeofthelimitationsofclient-serveroperationsaswellasthegeneralstructureoftheInternet.Let’sbeginourexami-nationofCDNbydiscussingclient-serveroperationsontheInternet.

1.3.1 Client-Server Operations on the Internet

TheInternetrepresentsacollectionofnetworkstiedtooneanotherthroughtheuseofroutersthatsupporttheTCP/IPprotocolsuite.ToillustratesomeoftheproblemsassociatedwithcontentdeliveryasdataflowsacrosstheInternet,let’sfirstassumeabest-casescenariowherebothclientandserverareonthesamenetwork.Then,wecanexpandthedistancebetweenclientandserverintermsofbothrouterhopsandnetworkstraversed,introducingthepointofpresence(POP)and

Page 52: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 33

peeringpointusedtointerconnectseparatenetworkstooneanotherontheInternet.

1.3.2 Client Server Operating on the Same Network

When we speak in terms of the Internet, the term network repre-sentsacollectionofsubnetswithconnectivityprovidedbyanInternetServiceProvider(ISP).Thus,whenwementionthatbothclientandserverresideonthesamenetworkontheInternet,thetwocomputerscanresideonthesamesegmentorondifferentsubnetsthatrequireoneormorerouterhopstobetraversedforonecomputertocommuni-catewiththeotherdevice.

Thedelayassociatedwiththedeliveryofservercontentprimarilyrepresentsafunctionofnetworktraffic,availablenetworkbandwidth,andthenumberofrouterhopsfromclienttoserver.Becausea“network”in terms of the Internet is controlled by an ISP, content delivery ismoremanageablethanwhendatadeliveryhastoflowbetweeninter-connected “networks.”That is, the ISPcanupgradebandwidthandroutersasitsignsupadditionalclientsasamechanismtominimizetheeffectofadditionaltraffic.

1.3.3 Client-Server Operations on Different Networks

WhentheclientandserverarelocatedondifferentInternet“networks,”trafficmustflowthroughanaccesspointwherenetworksoperatedbydifferentISPsareinterconnected.OntheInternet,apoint-of-presence(POP)issometimesusedasatermtoreferenceanaccesspointwhereonenetworkconnects toanother.Inactuality, thetermPOPhas itsroots in telephony and originally represented the physical locationwhere the local telephone operator connected its network to one ormorelong-distanceoperators.WhilethetermPOPisstillusedtoref-erencethelocationwheretwoISPsinterconnecttheirnetworks,amorepopulartermusedtoreferencethislocationistheInternet peering point.

1.3.4 Peering Point

AnInternetpeeringpointrepresentsthephysicallocationwheretwoormorenetworksareinterconnected.Suchlocationsarebasedupon

Page 53: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

34 aPraCtiCalguidetoContentdeliverynetworks

contractualagreementsbetweenISPsand trace theirorigins to theoriginalexpansionofARPANET,whosefullnamerepresentedtheAdvancedResearchProjectAgencyNetwork.AstheInternetevolved,ARPANETwasconsideredtorepresentabackbonenetwork,withothernetworkslinkedtooneanotherviaoneormoreconnectionstothebackbone.AstheInternethasexpandedandevolved,thereisnolongerasinglebackbonenetworkinthetraditionalmeaning.Instead,variouscommercialISPsaswellasprivatenetworkoperatorsenteredintoagreementswherebytwoormorenetworkswereinterconnectedatapeeringpointunderapeeringagreement.Todaytherearetwomaintypesofpeering:privateandpublic.AprivatepeeringpointresultsinanagreementamongtwoISPstopermit traffictoflowbetweentwonetworks.Incomparison,apublicpeeringpoint,alsoreferredtoasanInternetExchangePoint,representsalocationindependentofanysingleproviderwherenetworkscanbeinterconnected.ISPswithlargetrafficvolumes,suchasMCI(formerlyknownasWorldCom),areoftenreferredtoasTier1carriersandusuallyestablishpeeringagreementswithotherTier1carrierswithoutchargingoneanotherfor the interconnection. Smaller providers with lighter traffic loadstendtouseInternetExchangePoints,wheretheypayafeeforinter-connectionservices.

One example of a peering point is MAE-East. The term MAEstands for Metropolitan Area Ethernet and represented an inter-changeconstructedbyMetropolitanFiberSystems(ownedbyMCIpriortoitsacquisitionbyVerizon)forPSI,UUNET,andSprintLinkduring1993.MAE-EastwasestablishedatapproximatelythesametimethattheNationalScienceFoundationwasexitingtheInternetbackbonebusiness,whichenabledthispeeringlocationtobecomesosuccessfulthatasimilarfacilitywasopenedinSiliconValley,referredtoasMAE-West.By2005,MAE-EasthadexpandedtofoursitesintheWashington,D.C.,metropolitanareaandone location inNewYorkCity,with38membersranginginsizefromAT&TWorldNetandBT,toEpochNetworks,Equant,HurricaneElectric,Infornet,SwissComAG,UUNET,Verio,andXspedius.By2009,MAE-Eastoffered Frame Relay encapsulation at Optical Carrier (OC) OC-3,OC-12, and OC-48 data rates using Packet over SONET as wellasGigabitEthernetconnectionsonfiberandATMconnectionsatDS-3,OC-3,andOC-12datarates.

Page 54: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 35

TodaytherearethreemajorMAEsintheUnitedStates:MAE-East,MAE-West, and MAE-Central, with the latter located in Dallas,Texas.Inaddition,therearetwocentralMAEsforframeencapsula-tion(FE)servicethatarelocatedinChicagoandNewYork.

Figure 1.14illustratestheInternetTrafficReportforMAE-Eastwhenthefirsteditionofthisbookwaswritten.NotethatFigure 1.14illustratestwographs,eachindicatingactivityforthepast24hours.Thetopgraphindicatesatrafficindex,whichrepresentsascorefrom0to100,where0 is slowand100 is fast.Thetraffic index iscom-putedbycomparingthecurrentresponseofaPingechotoallprevi-ous responses from the same router over the past seven days, witha scoreof0 to100assigned to thecurrent response,dependingonwhetherthisresponseisbetterorworsethanallpreviousresponsesfromtheroute.

The second graph indicates response time in milliseconds (ms).The response time represents a round trip computed by sendingtraffic from one location to another and back to its origination.

Figure 1.14 The top portion of a Web page that provides three metrics concerning the operation of MAE-East.

Page 55: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

36 aPraCtiCalguidetoContentdeliverynetworks

If youcouldscrolldownFigure 1.14,youwouldbeable toviewathirdgraph,labeledpacketloss.Thatgraphindicatesthepercentofpackets dropped by the router or otherwise lost. Typically, routersdiscardpacketswhentheybecomeoverloaded.Thus,thisrepresentsameasurementofnetworkreliability.

ThethreemetricsdisplayedbytheInternetTrafficReportpartiallyshowninFigure 1.14canbeconsideredtorepresentbottleneckswheninformationflowsbetweenISPnetworks.Thatis,thepeeringpointcanbeviewedasafunnelthroughwhichalltrafficfromoneInternet“network”destinedtoadifferent“network”mustflow.Becausetheflow of data between ISPs is usually not symmetrical, this meansthat,atapoint in time, someISPsmayhavemoredata to transferthroughapeeringpointthantheconnectioncanhandle.Whenthesituationoccurs,theconnectionbecomesabottleneck.Althoughthepeeringpointcouldbeupgraded,manytimesonlyafewcustomersoperatingserversexperienceaproblem,andtheISPwheretheserverresidesmayverywellbereluctanttoupgradethepeeringpointduetothecostinvolved.Similarly,theinstallationofadditionalpeeringpoints can represent a significant cost. Even if both existing peer-ingpointsareupgradedandadditionalpeeringpointsestablishedtoprovideextrainternetworkconnectivity,doingsotakestimeandmaynotappreciablydecrease thedelayscurrentlyexperiencedbyclientsononenetworkattemptingtoaccessserversonanothernetwork.Thelattersituationresultsinafixednumberofrouterhopsneedingtobetraversedwhenaclientononenetworkneedstoaccessinformationfromaserverresidingonanothernetwork.

In developing this new book edition, this author returned toMAE-East. In doing so, he realized that, over the past five years,considerableprogresshadoccurredinupgradingthedatarateofcon-nections,butthetotaltrafficontheInternethadalsoincreased.Thus,itwouldbeinterestingtodetermineiftheincreaseincapacitycouldaccommodatetheincreasedtraffic.

Figure 1.15canbeconsideredasareturntoMAE-Eastfiveyearslater,occurringinNovember2009.Inthisillustration,notethatwhilethe Global Index remained at 83, significant strides had occurredatMAE-Eastdue to itsdata transmissionupgrades thatapparentlyoutpaced the increase in traffic. If you examine the traffic index,responsetime,andpacketloss,youwillnotealinenearzeroforeach.

Page 56: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 37

Figu

re 1

.15

Retu

rnin

g to

MAE

-Eas

t in

nove

mbe

r 200

9.

Page 57: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

38 aPraCtiCalguidetoContentdeliverynetworks

Thus, MAE-East has made significant strides in processing datawithoutlosingdataorhavingunacceptablelevelsofresponsetime.

Because of the previously mentioned problems associated withpacket loss and increased response time, another solution evolvedduringthe1990s.Thissolutionwasbaseduponmovingservercon-tentfromacentrallocationtomultipledistributedlocationsacrosstheInternet.Indoingso,Webpagesweremovedclosertotheultimateclientrequester, reducingboththenumberofrouterhopsarequestwould have to traverse as well as the round-trip propagation time.In addition,sinceWebpagesarenowclosertotheenduser,trafficwould not have to flow through peering-point bottlenecks, furtherenhancingtrafficflowandminimizingtrafficdelays.

TheeffortinvolvedindistributingWebpagessothattheyareavail-ableatdifferent locations isknownascontentdeliverynetworking,andthatisthesubjectofthisbook.Thus,aportionoftheincreasedperformanceatMAE-Eastcanbeattributed inpart to thegrowthincontentdeliverynetworks.Whilewewillprobemuchdeeperintothistopicinsubsequentchapters,fornowwecannotethattheroleofthemodernCDNistofacilitatetheretrievalofWeb-basedinforma-tionbydistributingtheinformationclosertotheultimaterequester.In doing so,CDNfacilitates a varietyof client-server communica-tionstoincludepull,push,andevenWebcrawling.

1.3.5 Video Considerations

In concluding this chapter, a few words concerning video are war-ranted.AlthoughvideowasbeingincorporatedontoWebsiteswhenthefirstversionofthisbookwaswritten,bothserverstorageandnet-workbandwidthwereconstraintsthatseverelylimiteditsuse.Overthepastdecade,datastoragehasgrownbyafactorof10to100,depend-ingupon the technologyconsidered. Incomparison,LANtransfershave increasedfromFastEthernet’s100Mbpsto10 Gigabit,whilewide-areanetworksthatwereprimarilyusingT1 linesat1.544Mbpshave,inmanycases,beenreplacedbyOpticalCarrier(OC)dataratesof155.52Mbps(OC-3),622.08Mbps(OC-12),and2488.32(OC-48).

Duetotheincreasedstoragecapacityofserversandtheabilityofbothwide-andlocal-areanetworkstohandleasignificantlyincreaseddataflow,videohassignificantlyincreasedasanoptiononWebpages

Page 58: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

introduCtiontoContentdeliverynetworking 39

foruserstoview.Inaddition,it’snowpossibletodownloadmovies,view television episodes, and even connect your television to theInternet to view shows on the “big screen.” In addition, in an eraofterrorism,manycitiesnowhaveextensivevideocameranetworks,withreal-timefeedsfrommanylocationsdisplayedonasingleconsolethattheoperatorcantoggleandexpanduponwhenheorsheneedsadditional information available from a full-screen image. WhilemanycityfeedsbypasstheconventionalInternet, therelatively lowcostofvideocamshasresultedinmanyWebsitesprovidingviewerswithviewsofa resort, scenesof sunrise,andotheractionswhereapictureistrulyworthathousandwords.

However,forvideotobesuccessful,Webdevelopersneedtocon-sider the fact thatnot allpotential viewers are similar.Thismeansthat Web developers need to consider the fact that some potentialviewers may reside on legacy LANs operating at 10 Mbps, whileotherviewerscouldresideoncableorDSLconnectionsoronGigabitLANs. In addition, the flow of data from the server to potentialviewers can take divergent routes, with some flows going throughcongestedpeeringpointswhileotherdataflowsdonot.Thus,Webdevelopersthatincorporatevideoshould,ataminimum,provideenduserswiththeoptiontoselectthetypeofInternetconnectiontheyareusing.Otheroptionscanincludetheframerateandresolutiontousetoviewaselectedvideo.Byincorporatingtheseoptions,youcantailorthedeliveryofvideotothedifferentnetworkingandcomputercapabilitiesofpotentialviewers.

Page 59: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 60: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

41

2clIent-Server ModelS

Inthefirstchapterofthisbook,weconsideredclient-serverarchitec-turetorepresentasinglemodelthatvariedbythemannerbywhichdataflowedbetweeneachcomputer.Ifdatafloweddirectlyfromclienttoserver,thearchitecturecouldberepresentedaswhatisreferredtoasatwo-tierarchitecture,withtheclientconsideredtorepresentthefirsttier,whiletheserverisconsideredtorepresentthesecondtier.If dataflowedfromclienttoserverand,dependingupontherequest,thenflowedtoanother server,wecouldrefer to thearchitectureasbeingathree-tierarchitecture.Asareview,inatwo-tierarchitecture,theuserinterfaceistypicallylocatedintheuser’sdesktop,whilethedatabasebeingaccessedislocatedonaserverthatprovidesservicestomanyclients.Inathree-tier(alsoreferredtoasmultitier)architecture,amiddlelayerwasaddedbetweentheclientandthedatabasetobeaccessed.Themiddlelayercanqueuerequests,executeapplications,providescheduling,andevenprioritizeworkinprogress.Whencon-sidering trade-offs between a two-tier and a multitier architecture,it  is important tonote that the latterwill always increase thedataflowonaLAN,andthisincreaseneedstobeconsidered,especiallyif the network is approaching congestion prior to implementing amultitiersolutiontoadatabaseretrievalproblem.Inaddition,becausea numberofdifferenttypesofsoftwareproductscanresideateachtier,thiscanresultinaseriesofdifferentclient-servermodels,whichisthetopicofthischapter.

In this chapter, we will turn our attention to a core set of soft-wareproductsthatoperateonbothclientandserver.Theinteractionofsuchproductsresults indifferentclient-servermodels,witheachmodelhavingvaryingcharacteristics thatareaffectedby latencyasdata moves across the Internet, by traffic on each Internet ServiceProvider(ISP)network,andbythetrafficroutedthroughanypointsofpresenceasdataflowsfromaclientlocatedononeISPnetworkto

Page 61: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

42 aPraCtiCalguidetoContentdeliverynetworks

aserverlocatedonadifferentISPnetwork,witharesponsethatthenflowsbacktotheclient.

Tobeginourexaminationofclient-servermodels,wewillfollowafamiliartuneand“beginatthebeginning.”Thatis,wewillexaminethethreetiersthatcanbeemployedindifferentclient-serverarchi-tectures.Indoingso,wewillnotedifferenttypesofpopularsoftwarethat can operate on each tier. Once this is accomplished, we willusetheprecedinginformationasafoundationtoprobedeeper intothecharacteristicsofdifferentsoftwareoneachtier,includingtheirrelationshipwithothersoftwareaswellastheeffectuponsoftwareoperationsas thedistancebetweenclientandserver increases fromresidence on a common network to computers located on differentnetworks. Because of the role of images and video in the modernWeb-pageenvironment,wewillfocusourdiscussionofcertaintypesofsoftwaretoincludetheuseofimagesandvideotechnology.

2.1   Overview

Figure  2.1 illustrates, in a block-diagram format, the three tiersassociatedwithmodernclient-serverarchitecture. In thisblockdia-gram,potential softwareprogramsare indicatedwithrespect to thetier where they would normally reside. In addition, the commonoperating systems used at each tier are indicated to provide readerswithadditionalinformationconcerningplatformsthatarecommonlyusedinthemodernclient-serverenvironment.

Tier 1Client

Web browser

MicrosoftInternetExplorer

NetscapeFoxfireOpera

SoftwareModulesHTML

Java appletsVBScriptActiveX

Web browser

MicrosoftInternetExplorer

ApacheCGI ScriptsVB ScriptsPlatformsWindows

UnixLinux

ApplicationServer

J2EE

WebSite

Development

Platforms

WindowsUnixLinux

JDBCODBC Relational

Database

Tier 2Server Layers

Tier 3Database Layer

Figure 2.1 The client-server architectural model.

Page 62: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 43

2.2   Client Operations

In thewonderfulworld of the Internet, thebrowser represents theclient. The purpose of the browser, besides surfing the Web, is toenableuserstorequestdocumentsfromaserveraswellastodisplaytheservicedrequest.WhileNetscapeCorporationdevelopedthefirstcommercially available browser, Microsoft Corporation’s InternetExplorernowdominatesthebrowsermarket,withapproximatelyan80%marketshare.OtherbrowserssuchasNetscape,theopen-systemMozilla Firefox, Opera, and other products cumulatively hold theremainderofthemarket.

BrowsersdifferintheversionoftheHyperTextMarkupLanguage(HTML)theysupportaswellas intheirsupportofcodemodules,plug-ins, amount of customization users can perform, and cachingcapability. As we noted in the first chapter in this book, browsersarenotstatic,andtheircapabilitiescanvaryconsiderablybasedupontheversionused.BecauseofthemarketshareofMicrosoft’sInternetExplorer,wewillprimarilyfocusourattentiononthisbrowserwhendiscussing client operations in this book. However, due to the risein the popularity of the Mozilla Firefox browser, this author willperiodicallyusethisbrowsertoillustratecertainbrowseroperations.

2.2.1 URLs

Uniform resource locators (URLs) are short strings that identifythe location of various resources on the Web. URLs are definedin Request for Comment (RFC) 1738, which was published inDecember1994.WhileRFC1738doesagoodjobofdefininghowresources are addressed, one persistent problem is the fact that itlimitstheuseofallowedcharacterstoasubsetoftheU.S.versionoftheAmericanStandardCodeforInformationInterchange(ASCII).BecausemodernbrowserscommonlysupporttheHyperTextMarkupLanguage(HTML)version4,whichisnotonlycompatiblewiththeInternationalStandardsOrganization (ISO)8859 codebut also allUnicodecharacters,thismeansthattherearesomecharacters,specif-icallythoseaboveHex255intheUnicodecharacterset,thatshouldnotbeusedinURLs.Inaddition,therearecertaincharactersthatarereservedforspecialuse,suchasthedollarsign($),ampersand(&),and

Page 63: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

44 aPraCtiCalguidetoContentdeliverynetworks

questionmark(?),whileothercharacterssuchastheless-than(<)andgreater-than (>) symbolshave thepotential ofbeing misinterpretedwhenusedinaURL.Thus,whenconstructingURLs,it’simportantto consider the fact that when you diverge from the use of alpha-numeric and defined special characters, you may need to referencematerialthatillustrateshowtoencodethemforuse.

The resources defined by URLs can include documents, images,downloadablefiles,electronicmailboxesandevenservices.ThegeneralformatofaURLisasfollows:

Protocol://location

Notethatthegeneralformatcommenceswithaprotocol,followedbyacolon,whichinturnisfollowedbytwoforwardslashcharactersandalocation.Twocommonexamplesofaprotocolarehttp(HyperTextTransferProtocol) and ftp (FileTransferProtocol).The locationcanincludeasignificantdegreeofvariety,dependingupontheactualloca-tionwhereinformationresides.Forexample,thehomepageoftheWebserverwhosedomainispopcorn.comwouldbeaccessedasfollows:

http://www.popcorn.com

Intheaboveexample,theuseofhttptells thecomputertousethehypertexttransportprotocol,whilewwwreferencesthehostnameatthedomainpopcorn.com.

NotshownintheaboveURLexampleistheportnumberusedbyhttp.Bydefault,theportnumberis80.If,forsomereason,theportontheserverbeingaccessedusesadifferentportnumber, thentheURLwouldbecome

http://www.popcorn.com:number

wherenumberrepresentstheportnumber.The URL preceding this last URL would take you to the home

pageoftheWebserveratthedomainpopcorn.com,whoseaddressiswww.popcorn.comandwhichisconfiguredtoreceiveHTMLqueriesonport80.Weshouldnotethatwecanalsospecifyapathtoapar-ticularlocationonacomputer,passparameterstoscriptsviaaquerystring,andevenrefertoaspecificsectionwithinanidentifiedresourcethrough the use of a fragment. Thus, a more detailed URL formatwouldbeasfollows:

Page 64: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 45

Protocol://domainname[IPaddress]:[port][path][Query][fragment]

ItshouldbenotedthatyoucanuseeitheraregistereddomainnameoranIPaddressforthedestinationlocationinaURL.Forexample,therelativelynewsearchenginefromMicrosoftcalledBingcanbeaccessed by the domain bing.com or its IP address of 64.4.8.147.It should alsobenoted that thehostnameanddomainnameof aURL are case-insensitive, as the Domain Name Service (DNS) isprogrammedtoignorecase.Thus,bing.comandBING.COMbothreferencethehomepageofthesearchengine.However,toaddsomeconfusiontoaddressing,thefilepathnameusedtospecifytheloca-tion of a document or program is case-sensitive; however, manyserverswilltreatsuchdataasbeingcase-insensitive,especiallyserversrunningMicrosoftWindows.Thus,

http://64.4.8.147/search?q=bing&go=&form=QBLH&qs=n

and

http://64.4.8.147/SEARCH?q=BING&go=&form=QBLH&qs=n

woulddirectyoutothesamesearchpageresultsonMicrosoft’sBingsearchengine,whilemodifyingthetwoURLsforaccesstoadiffer-entsearchenginemightresultindifferentresultswhendifferentcasesareused.OneinterestingWebsiteyoumightconsiderusingishttp://www.hcidata.info/host2ip.cgi, whichconvertshost/domainnamestoIPaddresseswhilealsoperformingareverseoperation.

As an example of the use of an expanded URL, let’s assumethatwewant to logonto thehomepageof thediscountbrokeragefirm TDAmeritrade. You could either point your browser to theTDAmeritradehomepageandselectamenuentrythatwillpromptyou toenter youruser information,or youcouldgodirectly to theaccesslog-onpage.Forthelatter,youwouldentertheURLas

https://wwws.ameritrade.com/cgi-bin/apps/Main

FromtheprecedingURL,notethattheprotocolwaschangedtohttps,wherethesstandsforsecure.AlsonotethattheCGI(CommonGatewayInterface)representsasetofrulesthatdescribeshowaWebserver communicateswithother software that canbeon the clientandtheserver.Inourexample,theTDAmeritradeaccesspagewill

Page 65: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

46 aPraCtiCalguidetoContentdeliverynetworks

displaya small formrequiringyou toenter yourUserIDandpass-word.TheCGIprogramwillprocesstheentereddata.LaterinthischapterwewilltakeamoredetailedlookatCGI.

To specify a path to a particular document, file, or service on aserverresultsintheuseofoneormoreadditionalforwardslashes(/)following the server address and optional port number. When theWeb was evolving, it was common for some organizations to hidetheirpresencebyusingaportotherthanport80.Unfortunately,theuseofport-scanningtechnologyresultedinporthidingasasecuritymechanismbeingrapidlyreplacedbytheuseofencryption.WithintheURL,avarietyofspecialcharacterscanbeusedtoeffectdifferentoperations.Forexample,thequestionmark(?)canbeusedtodelimittheboundarybetween theURLof aqueryableobject and a setofwordsusedtoexpressaqueryonthatobject.Forexample:

http://www.popcorn.com/?bucketprice.dat

2.2.1.1 Absolute and Relative InconcludingourinitialremarksaboutURLs, it shouldbenoted that theycanbeabsoluteor relative.AnabsoluteURLisonethatdirectlypointstotheexactlocationofafile.AbsoluteURLsmustbeunique.Thus,iftwoabsoluteURLsareiden-tical,thismeansthattheymustpointtothesamefile.Forexample,

http://popcorn.com/taffy/bucketprice.dat

isanabsoluteURL.Incomparison,arelativeURLpointstothelocationofafilefrom

apointofreference.ThatpointofreferenceiscommonlybeneaththelocationofthetargetfileandcanbeconsideredasareminderoftheroleoftheoldDiskOperatingSystem(DOS),whichusedthedoubledot(..)toidentifychangingapathdownthroughtheuseoftheChangeDirectory(CD)commandfollowedbythedoubledot(CD..)ortheuseofasingledot(.)toindicatethecurrentdirectory.Thus,iftheinitialabsolute URL took us to http://popcorn.com, then a relative URLsuch as ../taffy/bucketprice.dat would reference the file bucketprice.datinthedirectorytaffyonthedomainpopcorn.com.Ifyouwereatthatlocation,youcouldthenusetheURL./salty/bucketprice.dattoreferencethefilebucketprice.datinthedirectorysalty,assumingthatitslocationispresentlyunderthecurrentdirectory.

Page 66: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 47

2.2.1.2 Shortening URLs With the need to specify relatively longURLsandtheconstraintsofsuchservicesasTwitter,whichlimitsusersto140charactersperpost,aneedaroseforthesignificantshorteningofURLs.Thus,thisauthorwouldberemissifhedidnotmentiontwoservicesandbrieflydiscusshowtheycanbeusedtoshortenrelativelylongURLsaswellasprovidereaderswiththeknowledgethat,whenviewing aURL, youmightbe viewing an abbreviated versionof aratherlengthyactualuniformresourcelocator.

TwoservicesthatprovidepersonswiththeabilitytosignificantlyshortenURLsareBit.lyandTinyURL.com.Forexample,usingthelatter,thisauthorwasabletoshortentheURL

http://www.popcorn.com/files/html/special/dfg.mpeg

whichhasalengthof50characters,intotheURL

http://tinyurl.com/ye66ga7

whichhasalengthof26characters.IfyouareusingBit.ly,the50-characterURLpreviouslymentioned

wouldbereplacedbythe20-characterURL

http://bit.ly/6lKkiE

Thus,ifyou’reviewingaURLandnotetheinclusionof“tinyurl.com”or“bit.ly,”youareactuallyviewingatruncatedURL.

2.2.2 HTML

ThedocumenttypicallyrequestedbytheuseofaURLismoreoftenthannotencodedinHTML(HyperTextMarkupLanguage).HTMLrepresentsamarkuplanguagethatdefinesastructureforadocumentwithoutspecifyingthedetailsoflayout.

2.2.2.1 Versions TheoriginofHTMLcanbetracedbackto1980,whenTimBerners-Lee,aphysicistatCERNinSwitzerland,devel-opedasystemforresearcherstofacilitatethesharingofdocuments.Berners-Leeusedthiseffort,referredtoasENQZURE,asabasetodevelopHTML,programmingbothabrowserandserversoftwareto facilitate thesurfingofwhatbecameknownas theWorldWideWeb.In1991,Berners-Leemadeavailabletothepublicadescription

Page 67: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

48 aPraCtiCalguidetoContentdeliverynetworks

of HTML that covered 20 elements, of which a majority are stillincorporatedintothelatestversionofHTML,referredtoasHTMLversion5.

ThedevelopmentofHTMLwasconsiderably influencedby theStandard Generalized Markup Language (SGML), which is anInternational Standards Organization (ISO) standard technol-ogy for defining generalized markup languages for documents.ThefirstHTMLdraft specificationwaspublishedby the InternetEngineeringTaskForce(IETF)in1993,followedbyanHTML+draft in 1994. During 1995, the IETF HTML Working GroupcompleteditsHTML2.0specification,whichwaspublishedasRFC1866.OthernotableversionsofHTMLincludeHTML3.2,whichappearedinJanuary1997;HTML4.0,whichwasfirstpublishedinDecember1997andthenreissuedwithsomeeditinginApril1998;HTML4.01,whichwasissuedinDecember1999;andHTML5,whichwaspublishedas aworkingdraft in January2008.Becausebrowsersmustconsider the fact that serversaroundtheworldwilloperateavarietyofHTMLversions,it’sofcriticalimportancethatamodernbrowserbebackwardcompatibletosupportvariousversionsofthemarkuplanguage.

2.2.2.2 HTML Documents HTMLdocumentsareplain-textASCIIfiles that can be created using a text editor or word processor thatenablestexttobesavedinASCIIformat.Thefundamentalcompo-nentofthestructureofatextdocumentisreferredtoasanelement.Examples of elements include heads, tables, paragraphs, and lists.Throughtheuseoftags,youmarktheelementsofafileforabrowsertodisplay.Tags consist of a left anglebracket (<), a tagname, anda rightanglebracket (>).Tagsareusuallypairedwith the termina-tiontagprefixedwithabackwardslash(/),suchthat<h1>and</h1>wouldbeusedtosurroundaheaderlevel1name.Similarly,<h2>and</h2>wouldbeusedtosurroundaheaderlevel2name,withupto5 levelscapableofbeingdefined.

Althoughmostelementshaveastarttagandanendtag,somedonot.Forexample,alinebreak<br>doesnothaveanycontentanddoesnothaveaclosingtag.

Anelementaryexampleofahypertext-codeddocumentispresentedinFigure 2.2.

Page 68: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 49

2.2.2.3 Font Control Within the set of elements of HTML aremarkupsthatenableonetoaltertheappearanceoftext.Forexample,thepairoftags<b>boldface</b>indicatethatthevisualdisplayshouldbe renderedasboldface,while<i>italic</i> results in textbeingdis-playedinitalics.Table 2.1 listssevenfont-controlelementsavailableinHTML.

Inadditiontothefont-controlelementslistedinTable 2.1,youcanspecifythetypeoffontviatheuseofthe<FontFace=>tag,thecolorofthefontbytheuseofthe<FontColor=>tag,andthesizeofthefontbytheuseofthe<FontSize=>tag,eachofwhichisdelimitedbythe</Font>tag.Youcanalsomixvariousfont-controlelements,generat-ing,forexample,bolditalicbluetextinaparticularsize.However,it’simportanttonotethatthecorrectrenderingoftagsdependsuponthebrowserused.Whilemostmodernbrowsersarefine,olderbrowsersmayrepresentapotentialproblemindisplayingcertaintagsorcom-binationsoftags.

<html><head><title>Title of document</title></head>

<body><h1>Header level 1</h1>Some text goes hereAfter this text we add a line break<br><h2>Header level 2</h2>Some text goes here</body>

Figure 2.2 A brief HTML document.

Table 2.1 Font-Control Elements Available In HTML

<b>…</b> Bold<I>…</i> Italics<u>…</u> Underline<s>…</s> Strike-through<tt>…</tt> Teletype<sup>…</sup> Superscript (E=MC2)<sub>…</sub> Subscript (H2o)

Page 69: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

50 aPraCtiCalguidetoContentdeliverynetworks

2.2.2.4 Hypertext Links TheWeb canbe considered to represent a setof network-accessible information resources constructed upon threebasicideas.ThoseideasincludeURLsthatprovideagloballocationandnaming scheme for accessing resources, protocols such as HTTP foraccessingresources,andhypertextintheformofHTMLthatenablesonedocumenttocontainlinkstootherresources.UnderHTML,aunidirec-tionalpointer,referredtoasananchor,isusedinatagalongwithaURLtofunctionasahypertextlink.Theletteraisusedinananchortagasanabbreviationforanchor,withtheformatforahypertextlinkasfollows:

<ahref=destination>label</a>wherethe<a>and</a>tagsrepresentananchor,“destination”repre-sentsaURL,and“label”representshighlightedinformationdisplayedbythebrowserorselectedresults in thebrowserafter retrievinganddisplayinginformationfromthedestination.Thatdestinationishiddenfromviewuntilyoumovethecursoroverthelabel.Atthattime,theURLhiddenintheanchorwillbedisplayedonthebottomlineofthebrowser.Oneexampleoftheuseofananchorisasfollows:

See:<ahref=www.popcorn.com/tasty>fortastypopcorn</a>information.

2.2.2.5 Adding Images WhenusingHTML,animageisembeddedintoanHTMLpagethroughtheuseofthe<img>tag.Thegeneralformatoftheimgtagisshownasfollows,whereattributesshowninbracketsareoptional.

<imgsrc=“URL”alt=“textdisplayed”[attribute]…[attribute]/>

Althoughthe<img>tagsupportsawiderangeofattributes,onlytwoarerequired:srcandalt.ThefirstspecifiestheURLoftheimage,whilethesecondspecifiesanalternatetextforanimage.Forexample,assume that you wish to incorporate a picture labeled “homeview.jpg”ontoaWebpageanddisplay the text “OurFactory” if clientshaveconfiguredtheirbrowserssuchthattheydonotdisplayimages.Assumingthatthepictureislocatedontheserveratwww.popcorn.com/files/html,youwouldusethefollowing<img>tag:

<imgsrc=“www.popcorn.com/files/html/homeview.jpg”alt=“Ourfactory”/>.

Page 70: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 51

Table 2.2listsnineoptional<img>tagattributesandabriefdescrip-tionoftheiruse.Notethatthealign,border,hspace,andvspaceattri-butesarereferredtoas“deprecated.”Thismeansthattheseattributeshavebeensupersededbywhatmanyconsidermorefunctionalorflex-iblealternativesandweredeclaredasdeprecatedinHTML4bytheW3C,whichistheconsortiumthatsetstheHTMLstandards.Mostbrowserscanbeexpectedtosupportdeprecatedtagsandattributes,but eventually they are likely to become obsolete, and thus futuresupportcannotbeguaranteed.Thefivedeprecatedattributesarestillcommonlyused,althoughstylesheetsprovidemoreflexiblemethodsofmanipulatingimages.However,sincetheprimarypurposeofthisbook is todiscussContentDeliveryNetworksand networking,wecanuseourknowledgeof<img>tagattributeswithoutresortingtoadiscussionofstylesheets.

2.2.2.5.1 Image Formats Fromatechnicalperspective,allimages,includingvideo,canbeconsideredasaseriesofimagesoccurringataparticular framerate,andthesearestoredasdatafiles.However,wecancategorizeimagesasbeingeitherinarasterorvectorformat.Arasterformatresultsinanimagebeingbrokendownintoaseriesof dots, with the dots referred to as pixels. The vast majority ofimageformatsusedontheWebarerasterformatimages,suchastheGraphicsInterchangeFormat(GIF),JPEG(formatdevelopedbythe

Table 2.2 optional <img> Tag Attributes

ATTRIBUTE VALUES DESCRIPTIon

align top, bottom, middle, left, right

Specifies the alignment of an image according to surrounding elements (deprecated)

border pixels Specifies the width of the border around an image (deprecated)height pixels Specifies the height of an imagehspace pixels Specifies the white space on left and right side of an image

(deprecated)ismap ismap Specifies an image as a server-side image-maplongdesc URL Specifies the URL to a document that contains a long description of

an imageusemap #mapname Specifies an image as a client-side image-mapvspace pixels Specifies the white space on top and bottom of an image

(deprecated)width pixels Specifies the width of an image

Page 71: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

52 aPraCtiCalguidetoContentdeliverynetworks

JointPhotographicExpertsGroup),andbitmapped(BMP).Incom-parison,avectorimageiscreatedviaaseriesofmathematicalexpres-sions,suchasthoseusedbyCorelDraw(CDR),Microsoft’sWindowsMetafiles (EFW), and the Hewlett-Packard Graphics Language(HPGL).Inthisbook,wewillfocusourattentiononrasterimages.Table 2.3lists15examplesofimagefileformats.Ofthe15listed,themostpopularformatsusedontheWebareJPEG,followedbyGIFandBMP,whichareencountereduponoccasion.

Theuse of JPEG images is very popular because theyprovide alossycompressioncapabilitythatcansignificantlyreducedatastorageandtransmissionrequirements.Laterinthisbook,wewilldiscussthekeyadvantagesassociatedwithbothimageandvideocompression.

2.2.2.6 Adding Video TherearethreecommonmethodsassociatedwithaddingvideotoaWebpage.Onemethodinvolvesusingthe<embed…>tagtodisplayamediafile.Inactuality,theuseofthe<embed…>tagplacesabrowserplug-inontheWebpage.Theplug-inrepresentsaspecialprogramlocatedontheclientcomputerthatsupportsviewingthefile.Themostcommonplug-insareforsoundsandmovies.

Theembedtagdoesnotrequireaclosingtagandoperatessimilartotheimagetag.Whenyouusethe<embed...>tagyoumustincludeasrc(source)attributetodefinethelocationofthevideo.YoudothisbyspecifyinganapplicableURLthat,similartootherURLs,canbe

Table 2.3 Popular Image File Formats

BMP Bitmapped images stored under WindowsCLP Windows ClipartDCX zoFT PaintbrushGIF Graphics Interchange FormatJIF and JPEG Joint Photographic Experts Group Related Image FormatMAC MacPaintMSP MacPaint new versionPCT Macintosh PICT formatPCX zSoft PaintbrushPnG Portable network Graphics file formatPSP Paint Shop Pro formatRAW Unencoded image formatRLE Run-Length EncodingTIFF Tagged Image File FormatWPG WordPerfect image format

Page 72: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 53

local orglobal.The<embed…> tag supports awide rangeof attri-butes,mostofwhicharelistedinTable 2.4.

The default value for the CONTROLS attribute (Table  2.4) isCONSOLE.Thisvalueresultsinmostbrowsersdisplayingafull-sizedsetofcontrols,includingastartbutton,apausebutton,astopbutton,andavolumecontrolorlever.

Thegeneralformatofthe<embed…>tagisshownasfollows:

<embedsrc=”URL”[attribute1][…attributen]/>

The<embed…>tagusestheSRCattributetoindicatetheloca-tion of the plug-in data file. Typically, it also includes a WIDTHandHEIGHToftheplug-inarea.Unfortunately,differentbrowsersrenderdifferentmedia typesdifferently, sosometimesselecting theheightandwidthmayrequiresometrialanderroraswellastheuseofdifferentbrowsers.Forexample,thefollowingcodeembedsaMotionPicture Experts Group (MPEG) video called Popcorn-Technologylocatedontheserverinthedomainpopcorn.comintotheclientWebpageandimmediatelystartsthevideo:

<embedsrc=http://www.popcorn.com/files/html/Popcorn-Technology.mpeg

autostart=”true”/>

Table 2.4 <embed…> Tag Attributes

ATTRIBUTE MEAnInG

SRC URL of resource to be embeddedWIDTH Width of area in which to show resourceHEIGHT Height of area in which to show resourceALIGn How text should flow around the picturenAME name of the embedded objectPLUGInSPAGE Where to get the plug-in softwareHIDDEn Specify if the play/stop/pause is visible or not; values are true and falseHREF Make this object a linkTARGET AUToSTART Frame to link to specify if the sound/movie should start automatically;

values are true and falseLooP Specifies if the media is continuously played; values are true and falsePLAYCoUnT Specify how many times to play the sound/movie; value is a numberVoLUME Specify how loud to play the sound; value is a numeric from 0 to 100ConTRoLS Specify which sound control to displaySTARTIME Define how far into the sound to start and stopEnDTIME Define when to finish playing

Page 73: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

54 aPraCtiCalguidetoContentdeliverynetworks

Notethat<embed…>isnotapartoftheHTML4specifications,butitisstillwidelysupportedbymodernbrowsers.Unlikeothertags,theattributesusedby<embed…>dependonthetypeofplug-inbeingused.Perhapsthislaxpolicyexplainswhy<embed…>wasrejectedbytheHTMLstandardsmakers.

AsecondmethodthatcanbeusedtoplacemediafilesontoaWebpageisbyplacingtheURLofthemediafilesintotheHREFattri-buteofananchortag.HREFindicatestheURLbeinglinkedtoandmakestheanchorintoalink.Forexample,thefollowingtagcreatesalinktoviewthePopcorn-Technologympegvideoanddisplaysthemessage“ViewtheVideo,”whichwhenclickedresults inthevideobeingdisplayedintheclientbrowser.

<ahref=http://www.popcorn.com/files/html/Popcorn-Technology.mpeg>ViewtheVideo</a>

RecognizingtheroleofvideoonmodernWebpages,thedevelop-ersofHTML5addedavideoelementtothespecification.Thevideoelement added under HTML 5 functions as a block-level elementtype,whichmeansthatitiswithinthebodyofanHTMLdocumentandmakesupthestructureofadocument.Similartotheembedtagpreviouslydescribed,thevideoelementdefinedinHTML5includesa series of attributes. Those attributes include autoplay, autobuffer,controls,height,loop,poster,src,andwidth.

Twoofthepreviouslymentionedattributesdeserveabitofelabo-ration.First, theautobufferattribute,whenspecified, results in thevideoautomaticallybeginningbuffering.Thisattributeshouldbeusedwhenit’sconsideredhighlyprobablethataclientbrowserwillnavigatetoaWebpagetoviewthespecifiedvideoandnottoapagethathasvideoembeddedalongwithothercontent,suchasanewsorganiza-tion’sWebpagewithmultiplevideosavailableforselection.Secondly,theposterattributeresultsinspecifyingaURLofaposterframethatisdisplayeduntiltheclientplaysthevideo,anditsabsenceresultsinnothingbeingdisplayeduntilthefirstframebecomesavailable,whichisthendisplayedastheposterframe.

2.2.2.6.1 Video Formats Similartoimages,thereareanumberofvideofileformatsinuseontheWeb.SomeofthemorepopularfileformatsincludetheAppleComputer’sQuickTime.movand.qtfiles,

Page 74: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 55

Microsoft’s.avi(AudioVideoInterleave),theSWFstandardAdobeFlashfileformat,FLVwhichrepresentsaspecialtypeofFlashvideoembeddedinanSWFFfile,MPEG-4videofileswiththeextension.mp4, Real Media .rm and .rmvb files, and Microsoft’s WindowsMediaVideofileswiththeextension.wmv.Unfortunately,atthetimethisbookrevisionwasperformed,Microsoft’sWindowsMediaVideosupportofMPEG-4videowaslimitedtoanonstandardMPEG-4codec (coder-decoder) that isnotcompatiblewitha laterversionofMPEG-4thatwasstandardized.Similarly,therearevariousdegreesofincompatibilitybetweenvideofiles,whichledtothepopularityofvideoconverters,someofwhichsupportover30filetypes.

2.2.2.6.2 Video Servers and Streaming Video A video server repre-sentsacomputer typicallydedicated toprovidingstorage forvideo.Oneofthemajorapplicationsprovidedbyavideoserveristhesup-portofstreamingvideo,whichrepresentsaone-wayvideotransmis-sionoveradatanetwork.StreamingvideoiswidelyemployedontheWebaswellasoncompanynetworkstoplayvideoclipsandvideobroadcasts.Inthehome,computersinahomenetworkcanbecon-figuredtostreamvideotodigitalmediahubsconnectedtoahometheater.Unlikemoviefilesthatareplayedaftertheentirefilehasbeendownloadedandstored,streamingvideoisplayedshortlyafteronlyasmallamountisreceivedandbuffered,andthecontentdownloadedisnotstoredatthedestinationcomputer

Whenthestreamingvideoisbroadcast live,suchastheVictoriaSecretsannualfashionshow,itiscommonlyreferredtoas“real-timevideo.”However,technically,realtimemeansnodelays,andthereisaslightbuilt-indelayinstreamingvideo.

Thestreaming-mediadata-storagerequirementscanbesignificantandusually result in theneed toemploya separate server.Youcaneasilycalculatetheuncompresseddatastoragerequirementsbyusingthefollowingformula:

storagesize(inmebibytes)=length(inseconds)×length(pixels)×width(pixels)×framerate(frames/second)/8×1024×1024

since1mebibyte=8×1024×1024bits.Ifyouhavea60-seconduncompressed420×320videoclipoperat-

ingatarateof30framespersecond(fps),thenthestoragesizebecomes

Page 75: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

56 aPraCtiCalguidetoContentdeliverynetworks

MiB=60seconds×420pixels×320pixels×30fps/8×1024×1024=28.839

forasingle1-minutevideo.WhenwethinkoftheevolutionofthePC, less than a decade ago a 1-minute video would have requiredmostofthestoragecapacityofthecomputer’sharddrive.Evenwiththe modern 1-Tbyte storage capacity available on many PCs, thiswouldallowonly30minutesofvideotobestored,whichiswhydatacompressioniscriticalwhenworkingwithvideoimages.

2.2.3 HTTP

The Hypertext Transport Protocol (HTTP) represents a stateless,connectionless,reliableprotocol.HTTPisusedtotransferWebpagesfromaWebservertoaclient’sWebbrowserusingTCP(TransmissionControlProtocol),usuallyonport80.HerethetermstatelessreferstothefactthateachHTTPtransmissionoccurswithoutneedinginfor-mationaboutwhatpreviouslyoccurred.Theprotocolisconnectionless,whichmeansthatanHTTPmessagecanoccurwithoutestablishingaconnectionwiththerecipient.Finally,HTTPisareliableprotocol,as it uses the TCP transport protocol, which provides a reliableerror-detection and -correction facility. In Chapter 3 we will probemoredeeplyintoTCP/IP;however,fornowwewillsimplynotethatHTTPistransportedbyTCPwithinanIPdatagram.

2.2.3.1 Versions The current version of HTTP is 1.1, with previousversionsbeingnotedas0.9and1.0.ThefirstlineofanyHTTPmessageshouldincludetheversionnumber,suchasHTTP/1.1.

2.2.3.2 Operation Aspreviouslynoted,HTTPmessagesarestateless,connectionless, and reliable. HTTP messages fall into three broadcategories:Request,Response,andClose.

2.2.3.2.1 Request Message EveryHTTPinteractionbetweenclientandservercommenceswithaclientrequest.TheclientoperatorentersaURLinthebrowser,eitherbyclickingonahyperlink,typingtheURL into the browser address field, or selecting a bookmark. Asa result of one of the preceding actions, the browser retrieves the

Page 76: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 57

selected resource. To accomplish this task, the browser creates anHTTPrequestasfollows:

Request Line GET/index.html HTTP/1.1Header Fields Host: www.popcorn.comUser-Agent Mozilla/4.0

InthisHTTPrequest,notethattheUser-AgentrepresentssoftwarethatretrievesanddisplaysWebcontent.Netscape’sbrowseris identifiedbythe“Mozilla”useragent,whileMicrosoft’sInternetExplorerwouldhavethestring“MSIE”andversionnumberplacedintheUser-Agentfield.

InadditiontotheGETrequest,abrowsercanissueseveralothertypes of requests. Thus, let’s turn our attention to the structure ofHTTPRequestmethods,includingtheformatofsuchrequests.

Aspreviouslynoted,aRequestmessageistransmittedfromaclienttoaserver.Thefirstlineofthemessageincludestherequestmethodtobeappliedtotheresource,theidentifieroftheresource,andtheprotocolversioninuse.ToprovidebackwardcompatibilitywithpriorversionsofHTTP,therearetwovalidformatsforanHTTPrequest,bothofwhichareindicatedinFigure 2.3.

In examining the HTTP Request message formats shown inFigure 2.3,severalitemswarrantdiscussion.First,ifanHTTP/1.0serverreceivesaSimple-Request,itmustrespondwithanHTTP/0.9Simple-Response. An HTTP/1.0 client capable of receiving aFull-Response should never generate a Simple-Request. Secondly,theRequest-linebeginswitharequest-methodtoken,followedbytheRequest-URLandtheprotocolversion,endingwithacarriage-returnlinefeed(CRLF).Thus,foraFull-Request,

Request-line=MethodSPRequest-URLSPHTTP-VersionCRLF

Request=Simple Request/Full-RequestSimple-Request=“GET” SP Request-URL CRLFFull-Request= Request-Line *(General-Header | Request-Header | Entity-Header CRLF [Entity-Body]

Figure 2.3 HTTP request formats.

Page 77: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

58 aPraCtiCalguidetoContentdeliverynetworks

TheMethodtokenidentifiesthemethodtobeperformedontheresourceidentifiedbytheRequest-URL,where,underHTTP1.0,

Method=“GET”/“HEAD”/“POST”/extensionmethod

and

extension-method=token

The“GET”tokenisusedtoretrieveinformationidentifiedbytheRequest-URL.The“HEAD”tokenfunctionssimilartothe“GET”;however,theserverdoesnotreturnanyEntity-Body(information)intheresponse,i.e.,onlyHTTPheadersarereturned.The“POST”tokenprovidesamechanismtoannotateexistingresources,postamessage,orprovideablockofdatatoaserver,suchasthesubmissionofaform.

WhentheclienttransmitsaRequest,itusuallysendsseveralheaderfields.Aspreviouslynoted,thosefieldsincludeafieldname,acolon,oneormorespace(SP)characters,andavalue.AftertheRequest-lineandGeneral-Header,oneormoreoptionalHTTPRequest-Headerscan follow that are used to pass additional information about theclientanditsrequest,ortoaddcertainconditionstotherequest.Theformatofaheaderfieldlineisasfollows:

Field Name valueContent Type text/html

Table 2.5listssevencommonHTTPrequestheadersandprovidesabriefdescriptionofeach.

OneofthemoreinterestingaspectsofanHTTPRequest istheReferrerheaderfield.IfyouwerebrowsingaWebpageandclickedonananchor,theReferrerheaderfieldinformsthedestinationserver

Table 2.5 Common HTTP Request Headers

HEADER DESCRIPTIon

HoST Specifies the target hostnameContent-length Specifies the length (in bytes) of the request contentContent-type Specifies the media type of the requestAuthentication Specifies the username and password of the userRefer Specifies the URL that referred the user to the current resourceUser-agent Specifies the name, version, and platform of the clientCookie Returns a name/value pair set by the server on a previous response

Page 78: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 59

of the URL (i.e., the page being viewed) from where you invokedtheanchor.Thus,thisinformationcanbeusedtodetermineindirecttrafficflowaswellastheeffectofadvertising.

2.2.3.2.2 Response Message AfterreceivingandinterpretingaRequestmessage,aWebserverreplieswithanHTTPResponsemessage.TheformatofanHTTPResponsemessageisshowninFigure 2.4.

SimilartoaRequestmessage,aSimple-ResponseshouldonlybereturnedinresponsetoanHTTP/0.9Simple-Request.TheStatus-lineisthefirstlineofaFull-Responsemessageandincludestheprotocolversion,followedbyanumericstatuscodeanditsassociatedtextualphrase,witheachelementseparatedbySP(space)characters.Thus,theformatoftheStatus-lineis

Status-line=HTTP-VersionSPStatus-CodeSPReason-PhraseCRLF

Table 2.6listsmanycurrentlydefinedstatuscodesandtheirassoci-atedreasonphrases.

Theuseofstatuscodescanreflectmultiplesituations.Forexample,ifaclientwent toa restrictedWebserver location, theserverwouldrejecttherequestwitha“401”message.However,iftheserverwishestheclienttoauthenticateitsrequest,itwoulddosobyfirstrejectingtherequestwitha“401”messageandindicateinthe“www-Authenticate”field information about the authentication requirements so that theclientcandetermineifithasauthorizationtoauthenticate.Ifitdoes,itwouldthenincludeitsUser-IDandpasswordinthesubsequentrequest.

2.2.3.3 HTTP 1.1 ThemostpopularversionofHTTPis1.1,whichincludes several improvements over prior versions of the protocol.

Request=Simple Request/Full-RequestSimple-Request=[Entity-Body]Full-Request= Request-Line *(General-Header | Request-Header | Entity-Header CRLF [Entity-Body]

Figure 2.4 HTTP response formats.

Page 79: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

60 aPraCtiCalguidetoContentdeliverynetworks

Someofthoseimprovementsincludechunkeddatatransfers;supportforpersistentconnections,whichreducesTCPoverhead;byteranges,which enable portions of a document to be requested; hostnameidentification,whichallowsvirtualhosts;contentnegotiation,whichpermits multiple languages; and proxy support. While HTTP1.1is more efficient than prior versions of the protocol, it is also more

Table 2.6 Defined Response Status Codes and Reason Phrases

STATUS CoDE REASon PHRASE

100 Continue101 Switching protocols102 Processing200 oK201 Created202 Accepted204 no content205 Reset content206 Partial content207 Multi-status301 Moved permanently302 Moved temporarily304 not modified305 Use proxy services306 Switch proxy307 Temporary redirect400 Bad request401 Unauthorized403 Forbidden404 not found406 not accepted407 Proxy authorization required408 Request timeout409 Conflict410 Gone (service)415 Unsupported media type500 Internal server error501 not implemented502 Bad gateway503 Service unavailable504 Gateway timeout505 HTTP version not supported507 Insufficient storage

Page 80: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 61

complex.Forexample,thenumberofHTTPRequestMethodsisnowincreasedtoeight.Table 2.7liststheexpandedsetofRequestMethodssupportedbyHTTP1.1alongwithabriefdescriptionofeachmethod.

2.2.3.4 State Maintenance Aspreviouslydiscussed,theHTTPpro-tocolisstateless.ThismeansthatanHTTPsessiononlylastsfromabrowserrequesttotheserverresponse,afterwhichanysubsequentrequest is independent of the prior request. The stateless nature ofHTTPrepresentsaproblemifyouusethebrowsertoperformactivi-tieswhereinformationneedstobemaintained,suchasselectingitemsfromane-commerceWebsitetofillashoppingcart.

ThesolutiontothestatelessnatureofHTTPoccursintwoways.ThemostcommonmethodusedtoovercomethestatelessnatureofHTTPsessionsoccursthroughtheuseofcookies.AsecondmethodoccursbytheuseofahiddenfieldinanHTMLformwhosevalueissetbytheserver.

2.2.3.4.1 Cookies Acookieisashortfilethatfunctionsasaniden-tifier.ThecookieiscreatedbyaWebserveraccessedbytheclientasamechanismfortheservertostoreinformationabouttheclient,suchas itspreferenceswhenvisitingthesiteortheitemsselectedbytheuserforpotentialpurchase.

Cookiesarestoredontheclient forapredefinedperiodof time.Each time a client transmits a request to a server, any cookie pre-viously issued by the server is included in the client request andcanbeusedby the server to restore the stateof the client.Fromasecurityperspective,onceacookieissavedonaclient,itcanonlybetransmittedtotheWebsitethatcreatedthecookie.

Table 2.7 HTTP 1.1 Request Methods

METHoD DESCRIPTIon

GET Asks the server for a given resource and no contentHEAD Similar to GET, but only returns HTTP headers and no contentPoST Asks the server to modify information stored on the serverPUT Asks the server to create or replace a resource on the serverDELETE Asks the server to delete a resource on the serverConnECT Used to allow SSL (secure socket layer) connections to tunnel through HTTP connectionsoPTIonS Asks the server to list the request methods available for a given resourceTRACE Asks the server to echo back the request headers as it receives them

Page 81: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

62 aPraCtiCalguidetoContentdeliverynetworks

If you’re using Microsoft’s Internet Explorer, you can view yourbrowser’s cookies by selecting Tools> Internet Options and thenselectingtheSettingsbuttonintheTemporaryInternetFilesareaandtheViewFilesbuttonintheresultingSettingswindow.Figure 2.5illustratesthethreedialogboxesthataredisplayed.TheinitialboxintheupperleftisdisplayedonceyouselecttheInternetOptionsfromtheToolsmenu.SelectingtheSettingsbuttonresultsinthedisplayofthedialogboxlabeledTemporaryInternetFilesandHistorySettings.Finally,selectingtheViewFilesbuttonresultsintheboxinthefore-ground,showingsomeofthecookiesstoredontheauthor’scomputer.Notethattheleftportionofthedialogboxintheforegroundprovidesa description of the cookie, including its Internet address, when itexpires,aswellaswhenitwaslastchecked,accessed,andmodified.

2.2.3.4.1.1 Types of Cookies There are two types of cookies,referredtoaspersistentandtemporary.Apersistentcookieisonestoredasafileonyourcomputer that remains storedwhenyoucloseyourbrowser.A persistent cookie canonly be readby theWeb site thatcreateditwhenyousubsequentlyaccessthatsiteagain.Incomparison,atemporarycookieisonlystoredforyourcurrentbrowsingactivityandisdeletedwhenyoucloseyourbrowser.Throughthebrowsersettings,theusercanchoosetoautomaticallyacceptorblockallcookies,orbepromptedtoacceptorblockspecificcookies;however,whencookiesareblocked,certainactivities,suchasfillingashoppingcart,maybedifficultorimpossibletoachieve.

2.2.3.4.2 Hidden Fields Asecondmethodthatcanbeusedtoover-comethestatelessnatureofHTTPsessionsoccursthroughtheuseof hiddenfields in anHTML form.Thehiddenfield, as its nameimplies,ishiddenfromview.Theservercansetavalueinthehiddenfieldofaform,whichwhensubmittedbytheclientisreturnedtotheserver.Byplacingstate information inhiddenfields, theservercanrestorethestateoftheclient.

2.2.4 Browser Programs

Themodernbrowsercanbeconsidered to representa sophisticatedmini-operating system that allows other programs, referred to as

Page 82: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 63

Figu

re 2

.5

You

can

view

coo

kies

on

your

com

pute

r via

Mic

roso

ft’s

Inte

rnet

Exp

lore

r Too

ls m

enu.

Page 83: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

64 aPraCtiCalguidetoContentdeliverynetworks

plug-ins,tooperateunderitscontrol.ThebrowsercanalsoincludetheabilitytoruninterpretersthatexecuteJavaScriptandVBScript,bothofwhicharecommonlyused forfieldvalidationof forms.AnotherinterpreterthatcanberunwithinabrowserisJava,whichrepresentsahigh-levelprogramminglanguagethatprovidesmuchmorecapabilitythanJavaScriptorVBScript.OneadditionaltypeofprogramthatisuniqueinaWindowsenvironmentisanActiveXcontrol.AnActiveXcontrol represents a dynamic-link library (DLL). In the wonderfulworldofcomputers,aDLLrepresentsacollectionofsmallprograms,anyofwhichcanbeinvokedwhenneededbyalargerprogramthatisrunninginacomputer.SomeDLLsenablethecomputertocommu-nicatewithaspecifictypeofhardware,suchasaprinterorscanner,andarereferredtoasdevice drivers.

Because a DLL file is not loaded into random access memory(RAM)togetherwiththemainprogram,itsusesavesspaceinRAM.AllDLLfileshavethefilenamesuffix.dll,andtheyaredynamicallylinkedwiththeprogramthatusesthemduringprogramexecution.OnlywhenaDLLfileisneededisitloadedintoRAMandrun.

2.2.4.1 Helpers Collectively,plug-ins,ActiveX,andJavaappletsarereferred to ashelpers.Thisname category is assignedbecause theyhandledocumentsontheclientbrowserthatthebrowserbyitselfisnot capable ofhandling.Figure 2.6provides a general overviewofbrowsercomponents.NotethatwhileHTMLdecodingisbuiltintoeachbrowser,othercomponentsareoptional.

Figure 2.6 Browser components.

Page 84: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 65

2.2.4.2 Plug-Ins Aplug-inisacomputerprogramthatfunctionsasanextensionofabrowser,addingsomespecificfunctionality,suchasdisplayingmultimediacontentinsideaWebpage.Someexamplesofplug-insincludeShockwave,RealPlayer,andQuickTimeformulti-mediaaswellasNetZipandNeptune,whichcanbeconsideredutilityplug-in programs. NetZip permits the compression of data, whileNeptunesupportsActiveXforNetscapesothatitoperatessimilartoMicrosoft’sInternetExplorer.

Originally, Microsoft went to great lengths to make InternetExplorer compatiblewithNetscape.From JavaScript toHTMLtoplug-ins, InternetExplorer functioned similar toNetscape. In fact,early versions of Internet Explorer could even read the Netscapeplug-indirectory.Unfortunately,asInternetExplorergainedmarketshare,itscompatibilitywithNetscapediminished.

Figure 2.7illustrateshowyoucanviewplug-insonyourInternetExplorerbrowser.Inthisexample,youwouldfirstselectToolsfromthebrowser’smenubarandthenclickonthetablabeled“Programs.”Youwould thenselect the “Manageadd-ons”button,whichwouldresultinthedisplayofadialogboxlabeled“Manageadd-ons.”Notethat the term add-ons is now used by many browsers, includingInternet Explorer, to reference all programs that can be controlledvia thebrowser and its associatedplug-ins. In fact, if youcarefullyexamine Figure  2.7, you will note (under the Sun Microsystem’sheading)theentryoftheJavaplug-in.

2.2.4.3 Java Java is a high-level programming language that isunusualinthatitisbothcompiledandinterpreted.Usingacompiler,aJavaprogramisfirsttranslatedintoanintermediatelanguagecalledJavabytecodes,whichrepresentsplatform-independentcodingthatissubsequentlyinterpretedbytheinterpreterontheJavaplatform.Theinterpreterparsesordividesthecodeintosmallcomponents,sothateachJavabytecodeinstructionisexecutedonthecomputer.AlthoughJavaprogramcompilationonlyoccursonce,interpretationoccurseachtimetheprogramisexecuted.

2.2.4.3.1 Java Bytecodes Javabytecodescanbeviewedasmachine-code instructions forwhat is referred to as a JavaVirtualMachine(Java VM). Every Java interpreter included in a Web browser that

Page 85: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

66 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 2

.7

View

ing

the

prog

ram

s co

ntro

lled

by M

icro

soft

’s In

tern

et E

xplo

rer.

Page 86: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 67

can run applets as well as a specialized development tool can beconsideredasanimplementationofaJavaVM.Thus,Javabytecodespermit a programmer to develop software that can operate on anyJavaVM,permittingplatformindependenceaslongastheplatformincludesaJavainterpreter.NowthatwehaveanappreciationfortheJavaVM,let’sdiscussanotherfeatureofJavathatprovidesadditionalfunctionality.ThatfeatureistheJavaAPI.

2.2.4.3.2 The Java API TheJavaApplicationProgrammingInterface(JavaAPI)representsacollectionofpredefinedsoftwarecomponents,which,wheninvoked,performspredefinedfunctions.TheJavaAPIisgrouped into librariesof relatedclassesand interfaces,witheachlibraryreferredtoasapackage.

2.2.4.3.3 Java Programs ThemostcommontypesofJavaprogramsareappletsandapplications.Of the two,most readersareprobablymorefamiliarwithJavaapplets,astheyrepresentaprogramthatexe-cuteswithina Java-enabledbrowser.That is, anapplet representsaprogramwritteninJavathatcanbeincludedwithinanHTMLWebpage.WhenyouuseaJava-enabledbrowsertoviewapagethatcon-tainsaJavaapplet,theapplet’scodeistransferredfromtheservertotheclient,whereitisexecutedbythebrowser’sJavaVirtualMachine.Incomparison,anapplicationrepresentsastand-aloneprogramthatexecutesdirectlyonaJava-enabledcomputerplatform.

ThereareseveralvarietiesofJavaapplicationsthatwarrantabriefdiscussion.First,aspecialtypeofJavaapplication,knownasaserver,supportsclientsonanetwork.ExamplesofJavaserversincludeWebservers,print servers,proxy servers, andother typesof servers thattransmitJavaapplications.AnothertypeofJavaprogramisaservlet,whichcanbeviewedasanappletthatrunsonaserver.ServletsarecommonlyusedinconstructinginteractiveWebapplicationsinplaceofCGIscripts.

AnothertypeofJavaprogramthatdeservesmentionisJavaScript.JavaScriptrepresentsacross-platformobject-basedscriptinglanguage.JavaScriptisasmall,conciselanguagedesignedforembeddinginotherproductsandapplications,includingWebbrowsers.Insideabrowser,JavaScriptcanbeconnectedtotheobjectsofitsenvironment,ineffectprovidingprogramcontroloveritsenvironment.

Page 87: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

68 aPraCtiCalguidetoContentdeliverynetworks

JavaScriptcontainsacoresetofobjects,suchasArray,Date,andMathaswellasacoresetofoperators,controlstructures,andstate-ments referred to as language elements. This core JavaScript can beextendedforclient-sideandserver-sideoperations.Intheclientside,coreJavaScriptisextendedbysupplyingobjectsthatprovideadegreeofcontroloverabrowser,suchasenablingausertoplaceelementsintoaformornavigatethroughaWebpage.Incomparison,server-sideJavaextendsthecorescriptinglanguagebyprovidingelementsneces-saryforrunningJavaScriptonaserver.Oneexampleofaserver-sideJavaScriptextensionwouldbeanapplicationthatcommunicateswithaback-enddatabase.

TheintegrationofaJavaScriptprogramintoaWebpageisobtainedthroughtheuseoftheHTML<SCRIPT>tag.ThefollowingHTMLcodingindicatesanexampleoftheintegrationofaJavaScriptprogramintoanHTMLWebpage:

<HTML><HEAD><TITLE>JavaScriptExample</TITLE><SCRIPTLANGUAGE=“JavaScript”>MsgBox“Welcometopopcorn.com.”</SCRIPT>

2.2.4.4 VBScript Microsoft’s Visual Basic Scripting (VBScript) canbeviewedasamorepowerfulandpotentiallymoredangerousexten-sion to HTML than Java applets: A VBScript communicates withhostapplicationsusingWindowsScript.WhenusingWindowsScript,Microsoft’sInternetExplorerandotherhostapplicationsdonotrequirespecialcodingforeachscriptingcomponent,enablingacomputer tocompilescripts,obtainandcallentrypoints,andevencreateastandardlanguage run time for scripting. In a client-server environment, aVBScript-enabledbrowserwillreceivescriptsembeddedwithinaWebpage.Thebrowserwillparseandprocessthescript.

Similar tothe inclusionof JavaScript intoaWebpage, the inte-gration of a VBScript is obtained through the use of the HTML<SCRIPT>tag.ThefollowingHTMLcodeindicatesanexampleoftheintegrationofaVBScriptintoanHTMLWebpage:

Page 88: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 69

<HTML><HEAD><TITLE>VBScriptExample</TITLE><SCRIPTLANGUAGE=“VBScript”>MsgBox“Welcometopopcorn.com.”</SCRIPT>

ThroughtheuseoftheSCRIPTLANGUAGEtag,theWebbrowserisinformedhowtointerpretthecode.WhileNetscapebrowsersandsome versions of Microsoft’s Internet Explorer support JavaScript,onlythelatterbrowsersupportsVBScript.

If a browser does not support a particular scripting languageembeddedintoaWebpage,itwillnormallyeitherdisplaythescriptaspartoftheWebpageorhidethescriptfromview.Thelattersituationoccurswhenthescriptisencasedincommenttags(<!--and-->)and simply ignores the script. For example, returning to our priorscriptexample,ifweencasethescriptasfollows,itwillbeignoredbyabrowserthatdoesnotsupportVBScript.

<HTML><HEAD><TITLE>VBScriptExample</TITLE><SCRIPTLANGUAGE=“VBScript”><!--MsgBox“WelcometoPopCorn.com!”--></SCRIPT></HEAD>

2.2.4.5 ActiveX ActiveX represents a set of rules thatdefineshowapplications should share information. Developed by Microsoft,ActiveX has its roots in two Microsoft technologies referred to asObject Linking and Embedding (OLE) and Component ObjectModel(COM).

ProgrammerscandevelopActiveXcontrolsinavarietyofprogram-minglanguages,suchasC,C++,Java,andVisualBasic.AnActiveXcontrol can be viewed as being similar to a Java applet. However,unlikeaJavaappletthatislimitedincapabilityandcannotperform

Page 89: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

70 aPraCtiCalguidetoContentdeliverynetworks

disk operations, ActiveX controls have full access to the Windowsoperating system. While this capability provides ActiveX controlswith more capability than Java applets, it also entails a degree ofsecurityrisk.Tocontrolthisrisk,MicrosoftdevelopedaregistrationsystemthatenablesbrowserstoidentifyandauthenticateanActiveXcontrolpriortoitsdownload.

2.3   Server Operations

Intheclient-servermodelillustratedinFigure 2.1,wenotedthattheserver layer can consist of one or more computers. While you canalwaysexpectaWebserverintheclient-servermodel,it’salsopossi-blethattheWebserverwillcommunicatewithanapplicationserver.The application server, in turn, can communicate with a databasethateitherresidesontheapplicationserveroronaseparatedevice,such as a back-end database or a redundant array of independentdisks (RAID). In this section,wewill briefly examineWeb-serveroperations,includingthemannerbywhichtheyinterconnecttoappli-cationserversandadatabase.

2.3.1 Evolution

The modern Web server can be viewed as a descendent of Webservers developed at the U.S. National Center for SupercomputingApplications(NCSA),whereMosaic,thebrowserthatevolvedintoNetscape, was developed, and CERN, the European Organizationfor Nuclear Research. As work on client-browser and Web-serversoftware progressed during the late 1990s at NCSA and CERN,commercialapplicationsontheInternetrapidlyexpanded,resultingin softwaredevelopers offering products for this rapidly expandingmarket. Although there are only a limited number of Web-serverprograms tochoose fromtoday,eachoperates ina similarmanner.Thatis,allcommunicationsbetweenWebclientsandtheserveruseHTTP,andtheWebservercommencesoperationbyinformingtheoperating system that it’s ready to accept communications throughaspecificport.Thatport is80forHTTP,whichistransportedviaTCP,and443whensecureHTTP(https)isemployed.

Page 90: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 71

2.3.2 Common Web Server Programs

There are a number of Web server application programs that oper-ateunderdifferentoperatingsystems.SomeWebserverprogramsarebundledwithgeneralserversoftwaredesignedtooperateunderseveralversionsofWindowsonIntelPentiumandAMDplatforms.OtherWeb server programs operate under UNIX and Linux operatingsystems.Table 2.8listssomeofthemorepopularWebserverprogramsandabriefdescriptionoftheoperatingenvironmentoftheprogram.

2.3.2.1 Server Characteristics In general, Web servers can be con-sidered to have two separate types of directories. One directorystructureiscreatedbytheoperatingsystemandcommencesatthebeginningofthediskdrive,whichisconsideredtorepresenttherootdirectory.Theseconddirectorycommences froma locationon thedisk drive under which Web documents are normally stored. ThisWebdirectoryalsohasaroot,referredtoaseitherthedocument rootortheHome Directory local path.

The left-hand portion of Figure  2.8 illustrates the MicrosoftInternet Information Server’s Default Web Site Properties dialogboxwith itsHomeDirectorytabselected.Notethatthe localpathisshownasC:\inetpublwwwroot,whichrepresentsthedefaulthomedirectoryfromwhichservercontentflowsinresponsetoaclientquery.ByclickingontheBrowsebutton,theBrowseforFolderwindowisopened,whichisshownintheright-handportionofFigure 2.8.

Let’sassumethatthecomputerURLorsitenameiswww.popcorn.comandthatyoustoreadocumentnamed“welcome”inthewwwroot

Table 2.8 Common Web Server Application Programs

PRoGRAM oPERATInG EnVIRonMEnT

Internet Information Server Microsoft bundles this product with its Windows 2000/2003 Server

Apache An open source, no-charge-for-use HTTP server that operates under UnIX, Linux, and Windows

SunonE (renamed Java System) originally developed by netscape jointly with Sun Microsystems, versions operate under Windows and UnIX

WebSTAR A server suite from 4D, Inc., that operates on a Macintosh platform

Red Hat Content Accelerator A kernel-based Web server that is limited to supporting static Web pages in a Linux environment

Page 91: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

72 aPraCtiCalguidetoContentdeliverynetworks

directory.Then,C:\inetpub\wwwroot\welcomerepresentsthewelcomedocument directory address. Thus, the URL request of http://www.popcorn.comwouldelicitthereturnoftheWelcomeWebpagefromthepathC:\inetpub\wwwroot\welcome.html.Fromthepreceding,wecannotethataWebservermapsURLrequeststoadirectorystructureonthecomputer in termsof theWebhomepageordocumentroot.Inadditiontobeingabletospecifyalocalpathordocumentroot,theserveroperatorcanconfigureaWebserver togeneratecontent fromashareonanothercomputeroraredirectiontoaURL.Thelatterineffect permits redirection of requests from one directory to anotherlocationthatcouldbeonthesamecomputeroronadifferentcomputerlocatedthousandsofmilesaway.

AmodernWebserverhasseveralcharacteristicsthatdeservemen-tioning.AmongthosecharacteristicsaretheabilitytosupportmultiplesitesonthesamecomputerandthecapabilitytoprovideclientswithdocumentsfromthedocumentrootsorInternetdirectoriesonotherservers.Thefirstcharacteristicisreferredtoasvirtual hosting,whiletheability toprovidedocuments fromother servers turns theWebserverintoaproxy server.InadditiontosupportingWeb-pagedelivery,modernWebserversincludesupportforFTP,Gopher,News,e-mail,anddatabaseaccess.WebserversalsocommonlyrunCGIscriptsaswellasservlets,thelatterrepresentingacompiledJavaclass.Because

Figure 2.8 The root or home directory is specified with respect to the disk root.

Page 92: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 73

CGIscriptsandservletscansignificantlyenhanceclient-serveropera-tions,let’sfocusourattentionuponboth.

2.3.2.1.1 CGI Scripts TheCommonGatewayInterface(CGI)rep-resentsastandardfor interfacingexternalapplicationswithservers.To understand the need for CGI, you need to realize that a plainHTML-codeddocumentisstaticanddoesnotchange.Thismeansthat ifyouneedtovary thedocument, suchasbyplacinganorderorenteringarequestforanitemthatcouldrangefromthepriceofastocktotheweatherinazipcode,youneedatooltooutputdynamicinformation.ACGIprogramprovidesthattool,asitisexecutedinrealtime,whichenablesittooutputdynamicinformation.

ThetermCommon Gateway InterfacedatestotheearlydevelopmentoftheWeb,whencompaniesdesiredtoconnectvariousdatabasestotheirWebservers.Theconnectionprocessrequiredthedevelopmentofaprogramthat,whenexecutedonaWebserver,wouldtransmitapplicableinformationtothedatabaseprogram,receivetheresultsofthedatabasequery,andreturntheresultstotheclient.Becausetheconnectionprocessbetweentheserverandthedatabase functionedas a gateway, the resulting standard acquired the name CommonGatewayInterface.

ACGIprogramcanbewritteninanylanguagethatcanbeexe-cutedonaparticularcomputer.Thus,C,C++,Fortran,PERL,andVisualBasicrepresentsomeofthelanguagesthatcanbeusedinaCGIprogram.InaUNIXenvironment,CGIprogramsarestoredinthedirectory/cgi-bin.InaMicrosoftInternetInformationServerenviron-ment,CGIprogramsarelocatedinC:\internetpub\wwwroot\scripts.That locationholdsCGIprogramsdeveloped through theuse of ascriptinglanguagesuchasPERL.

OneofthemostcommonusesforCGIscriptsisforformsubmis-sion,withaCGIscriptexecutingontheserverprocessingtheentriesin the form transmitted by the client. CGI scripts can be invokeddirectly by specifying their URL directly within HTML or evenembeddedwithinanotherscriptinglanguage,suchasJavaScript.

2.3.2.1.2 Servlets A servlet represents a compiled Java class.SimilartoCGI,servletsoperateonaserverandarecalledthroughHTML. When a Web server receives a request for a servlet, the

Page 93: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

74 aPraCtiCalguidetoContentdeliverynetworks

requestispassedtotheservletcontainer.Thecontainerthenloadstheservlet.Oncetheservletiscompleted,thecontainerreinitializesitselfandreturnscontroltotheserver.

AlthoughsimilartoCGI,servletscanbefaster,astheyrunasaserverprocessandhavedirectaccesstoJavaAPIs.BecauseservletsarewritteninJava,theyareplatformindependent.

Inconcludingourdiscussionofservlets,let’sturnourattentiontoarelatedcousin,referredtoasJavaServerPages(JSP).JavaServerPagesare similar to servlets in that theyprovideprocessinganddynamiccontenttoHTMLdocuments.JSPsarenormallyusedasaserver-sidescriptinglanguageandaretranslatedbytheJSPcontainedinservlets.

Prior tomovingon to theapplication server, a fewwordsare inorderconcerningtheMicrosoftserverstandardreferredtoasISAPI.

2.3.2.1.3 ISAPI ShortforInternetServerAPI,ISAPIrepresentsaMicrosoftserver-specificstandardtoloadaDLLintotheaddressspaceofaservertointerpretascript.AlthoughsimilartoCGI,theISAPIisfaster,asthereisnoneedforaservertospawnanewexecut-ableasisthecasewhenaCGIscriptisused.Instead,aDLLisloadedthatprovidestheinterpretationofthescript.

2.3.3 Application Servers

IfwebrieflyreturnourattentiontoFigure 2.1,wecannotethattheapplication server represents a tier-2 device along with a conven-tional Web server. In actuality, both a Web server and applicationservercanresideonthesameoronseparatecomputers.Theapplica-tionserver,alsocommonlyreferredtoasanappserver,canrangeinscope from a program that handles application operations betweenaclientandanorganization’sback-enddatabasestoacomputerthatrunscertainsoftwareapplications.Inthissection,wewillfocusourattentionuponseveralapplicationservermodelsandvariouspopularsoftwareproductsthatcanbeusedbetweentheapplicationserverandthetier-3databaselayer.

2.3.3.1 Access For many clients, access to an application server istransparent.TheclientwilluseabrowsertoaccessaWebserver.TheWebserver,dependinguponitscoding,canprovideseveraldifferent

Page 94: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 75

waystoforwardarequesttoanapplicationserver.SomeofthosewaysincludetheuseofCGI,Microsoft’sActiveServerPage,andtheJavaServerPage.

AnActiveServerPage(ASP)isanHTMLpagethatincludesoneormorescriptsprocessedbytheWebserverpriortotheWebpagebeingtransmittedtotheuser.AnASPfileiscreatedbyincludingaVBScriptor JavaScript statement inanHTMLfileorbyusinganActiveX Data Objects program statement in a file. In comparison,a JavaServerPageusesservletsinaWebpagetocontroltheexecutionofaJavaprogramontheserver.

2.3.3.2 Java Application Servers Java application servers are basedupontheJava2Platform,EnterpriseEdition(J2EE).J2EEemploysamultitiermodelthatnormallyincludesaclienttier,middletier,andanEnterpriseInformationSystems(EIS)tier,withthelatterstoringapplications,files,anddatabases.Incomparison,themiddletiercanconsistofaWebserverandanEJB(EnterpriseJavaBean)server.

TheuseofJ2EErequiresadatabasethatcanbeaccessed.WhenusingJ2EE,youcanaccessadatabasethroughtheuseofJDBC(JavaDatabase Connectivity), APIs (Application Program Interfaces),SQLJ(SystemQueryLanguage-Java),orJDO(JavaDataObjects).

JDBCrepresentsanindustry-standardAPIfordatabase-independentconnectivity between the Java programming language and a widerangeofdatabases.SQLJ,which stands forSQLJava, represents aspecificationforusingSQLwithJava,whileJDOrepresentsanAPIstandardJavamodelthatmakesPlainOldJavaObjects(POJOs)per-sistentinanytierofanenterprisearchitecture.Figure 2.9illustratestherelationshipofthethreetierstooneanotherwhenaJ2EEplatformis employed. In examining Figure  2.9, note that EJB (EnterpriseJavaBeans) represents a server-side component architecture for theJava2platform.EJBenablesdevelopmentofdistributed,secure,andportableapplicationsbaseduponJava.

TherearetwotypesofEJBs:EntityBeansandSessionBeans,withthe former representing an object with special properties. When aJavaprogramterminates,anystandardobjectscreatedbytheprogramarelost,includinganysessionbeans.Incomparison,anentitybeanremainsuntilitisexplicitlydeleted.Thus,anentitybeancanbeusedbyanyprogramonanetworkas longas theprogramcan locate it.

Page 95: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

76 aPraCtiCalguidetoContentdeliverynetworks

Becauseit’seasytolocatedataonpermanentstorage,entitybeansarecommonlystoredwithinadatabase.Thisallowstheentitybeantorunonaservermachine.Whenaprogramcallsanentitybean, controlis passed to the server, and the program thread stops executing.Uponcompletion,controlisrestoredtothecallingprogram,andtheprogramresumesexecution.

2.3.3.3 General Server Tools In concluding our discussion concern-ingapplicationservers,weneedtobrieflyconsidertwogeneralservertools—ColdFusionandODBC—aswellasaspecializedfamilyoftools,Microsoft’s.NETFramework.

Cold Fusion is a popular development tool set originally fromMacromedia that is now sold by Adobe Systems. The initial use ofColdFusionwas to enabledatabases tobe integratedwithHTMLWeb pages. Cold Fusion Web pages include tags written in ColdFusionMarkupLanguage(CFML),whichsimplifiestheintegrationwithdatabases,eliminatingtheneedformorecomplexlanguages,suchasC++.Forexample,usingColdFusion,youcouldcreateaWebpagethatasksusersfortheirsexandage,informationthatwouldbeusedbyaWebservertoqueryadatabaseforaninsurancepremiumthatwouldbepresentedinHTMLcodefordisplayontheuser’sbrowser.

Thesecond tool,ODBC(OpenDataBaseConnectivity), repre-sentsastandarddatabaseaccessmethoddevelopedbytheSQLAccess

Figure 2.9 The three-tier relationship when a J2EE platform is used.

Page 96: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 77

group.Byinsertingamiddlelayer,calledadatabase driver,betweenanapplicationandtheDBMS,itbecomespossibletoaccessdatafromanyapplication,regardlessofwhichDBMSishandlingthedata.Thedatabasedriverineffecttranslatedapplicationdataqueriesintocom-mands that theDBMSunderstands, requiring that the applicationandtheDBMSbeODBCcompliant.

We conclude our discussion of the tools used in client-serveroperationsbyfocusingonaspecializedfamilyoftools,Microsoft’s.NETFramework.

2.3.3.4 Microsoft’s .NET Framework TheMicrosoft .NETFrameworkrepresentsasoftwareinfrastructureforthefirm’s.NETplatform,whichprovidesacommonenvironmentforcreatingandoperatingWebservicesandapplications.NETcanalsobeconsideredtorepresentanInternetandWebstrategythatemploysaserver-centriccomputingmodel.

NETisbuiltuponaseriesofInternetstandards,suchasHTTPforcommunicationsbetweenapplications;XML(eXtensibleMarkupLanguage),whichisastandardformatforexchangingdatabetweenInternet applications; SOAP (Simple Object ACCESS Protocol),whichrepresentsastandardforrequestingWebservices;andUDDI(UniversalDescription,DiscoveryandIntegration),whichisastan-dardforsearchinganddiscoveringWebservices.

Asyoumightexpect,the.NETframeworkcanbeinstalledoncom-putersrunningMicrosoft’sWindowsoperatingsystemandfunctionsasamechanismfordeliveringsoftwareasWebservices.NETincludesalargelibraryofcodedsolutionstocommonprogrammingproblemsandasoftwaremodulethatfunctionsasavirtualmachinetomanagetheexecutionofprogramswrittenspecificallyfortheframework.

Includedinthe.NETframeworkaresuchcommonclasslibrariesas(a)ADO.NET,asetofcomputersoftwarecomponentsthatcanbeusedbyprogrammerstoaccessdataanddataservicessimilartoJavaPersistenceAPIsand(b)ASP.NET,whichenablesWebservicestolinktoapplications,services,anddevicesviaHTTP,HTML,XML,andSOAP.Through theuse of the class library, programmers candevelopapplicationswithawiderangeoffeaturesandevencombineitwiththeirowncodetoproducetailoredapplications.

Akeyfunctionofthe.NETframeworkisitsCommonRuntimeEngine(CRE).TheCRErepresentsthevirtual-machinecomponent

Page 97: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

78 aPraCtiCalguidetoContentdeliverynetworks

of the .NET framework. To ensure that programoperations occurwithin certain parameters with respect to memory management,security,andexceptionhandling,all .NETprogramsexecuteunderthesupervisionoftheCRL.

The .NETframeworkcanbeconsideredas languageneutral andtheoreticallyindependent.ItcurrentlysupportsC++,C#,VisualBasic,Microsoft’s version of JavaScript called Jscript, COBOL, and manythird-partylanguages.

Version3.0ofthe.NETFrameworkisincludedwithWindowsServer2008andWindowsVista.Version3.5oftheframeworkwasreleasedduring2009onWindows7.The.NETFrameworkfamilyalsoincludesversionsformobile-orembedded-deviceuse.Aversionoftheframe-workreferredtoas.NETCompactFrameworkisavailableonWindowsCEplatforms,includingWindowsMobiledevicessuchassmartphones.

Sinceitsannouncementduring2000,.NEThasbeenincorporatedintoavarietyofMicrosoftproducts.SuchproductsincludeVisualStudio.NETandVisualBasic.NETaswellasdifferentversionsofWindows,providingabuildingblockforcreatingandoperatingInternetservices.

Nowthatwehaveanappreciationforthetoolsusedinclient-serveroperations,wewillconcludethischapterbyexaminingtheefficiencyofthearchitectureasthedistancebetweenclientandserverlayersincreases.

2.4   Distance Relationship

In the three-tier client-server model, the first two tiers are usuallyseparatedfromoneanother,whilethethirdtierisnormallylocatedwiththesecondtier.Thus,forthepurposeofexaminingthedistancerelationshipbetweentiers,wecanviewthethree-tiermodelasbeingatwo-tierone.Thatis,wecanexaminetheeffectuponclient-serverarchitecture as the distance between client and server increases.In doingso,wecanusetwopopularTCP/IPtools.ThosetoolsarethePingandTracerootprogramscontainedinmodernoperatingsystems.

2.4.1 Using Ping

ThePingutilityprogram,bydefault,generatesfourdatapacketsthataretransmittedtoadefineddestination.Thatdestinationcanbespeci-fiedeitherasahostaddressorasadotteddecimaladdress.Therecipient

Page 98: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 79

ofthepacketsechoesthepacketsbacktotheoriginator.Becausetheoriginatorknowswhenittransmittedeachpacket,itbecomespossibletocomputethetimeintermsofround-tripdelay.Pingisbaseduponthe use of the Internet Control Message Protocol (ICMP). ICMPtype 8 (Echo)messagesare transmitted toan indicateddestination,whichrespondswithICMPtype0(EchoReply)messages.

To illustrate the effect of distance upon client-server operations,let’s examine the round-trip delays between this author’s computerlocatedinMacon,Georgia,andtwoservers,onelocatedintheUnitedStates and the second located in Israel. Figure  2.10 illustrates thepingingof theYahooserver. Inexamining theentries in the refer-encedillustration,youwillnotethatfourrepliesarelistedfromtheIPaddress216.109.112.135,whichrepresents theIPaddressof theYahoo server.The round-tripdelay times are indicatedby “time=”andhavevaluesof20,20,11,and20milliseconds,respectively.Belowthelastreplyline,theprogramgeneratesasummaryofstatisticsforpingingthedestination.IntheexampleshownforpingingYahoo,theaverageround-tripdelaywasindicatedtobe17milliseconds.

For our second example, this author pinged the server logtel.com, adata-communications seminarorganization located in Israel.Figure 2.11illustratestheresultsassociatedwithpingingthatlocation.Asyoumightexpect,theround-tripdelayhasconsiderablyincreasedduetotheincreaseddistancebetweenclientandserver.However,whatmaynotbeapparentisthereasonfortheincreasebeingapproximatelyanorderofmagnitude,fromanaverageof17msforpingingYahoo

Figure 2.10 Using Ping to ascertain the round-trip delay to Yahoo.com.

Page 99: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

80 aPraCtiCalguidetoContentdeliverynetworks

toanaverageof223mswhenpingingtheLogtelserver.Toobtainanappreciationofthekeyreasonbehindtheincreasedround-tripdelay,weneedtoturntothesecondTCP/IPtool,Traceroot.

2.4.2 Using Traceroot

Asitsnameimplies,Tracerootisaprogramthattracestheroutefromsource todestination.UnderMicrosoftWindows, thenameof theprogramwastruncatedtoTracert.

Tracert invokes a series of ICMP Echo messages that vary thetime-to-live(TTL)fieldintheIPheader.ThefirstIPdatagramthattransportsthepinghasaTTLfieldvalueof1.Thus,whenthedata-gramreaches thefirst routeralong thepath to thedestination, therouterdecrementstheTTLfieldvalueby1,andtheresultingvalueofzerothrowsthedatagramintothegreatbitbucketintheskyandreturnsanICMPmessagetype11(TimeExceeded)tothe origina-tor.TheICMPmessagereturnedtothesenderincludestherouter’sIP addressandmayadditionallyincludeinformationabouttherouter.Theoriginator increments theTTLvalueby1and retransmits theping,allowingittoflowthroughthefirstrouteronthepathtothedestination.ThesecondrouterthenreturnsanICMPType11mes-sage, and the process continues until the Tracert program’s pingreaches the destination or the default number of hops used by theprogramisreached.

Figure 2.12illustratestheuseoftheMicrosoftTracertprogramtotracethepathfromtheauthor’scomputerinMacon,Georgia,to the

Figure 2.11 Using Ping to determine the round-trip delay to a server located in Israel.

Page 100: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

Client-servermodels 81

Yahoo server. In comparison, Figure  2.13 illustrates the use of theTracertprogramtotracethepathtotheLogtelserverlocatedinIsrael.

Incomparing the twousesof theTracertprogram,wecannotethatthereare8routerhopstoYahoo.com,whilethereare17hopstotheLogtelserver.Becauseeachrouterhoprequiressomeprocessingtime,wecanattributeaportionofthedelaytothenumberofrouterhopstraversedfromsourcetodestination.However,astheradiocom-mentatorPaulHarveyisfondofsaying,“That’sonlypartofthestory.”

If you carefully examine the Tracert to Logtel.com shown inFigure 2.13,youwillnotethatforthefirst10hopsthedelaywasunder20 ms. It wasn’t until hop 11 that the delay appreciably increased,goingfrom20toeither90or100ms,dependinguponwhichofthethreetracesoccurred.Athop10,therouterdescriptionindicatesthatitwasinNewYorkCity,whileathop11therouterappearstobea

Figure 2.12 Using Tracert to observe the path from the author’s computer to the Yahoo server.

Figure 2.13 Using Tracert to examine router hop delays to a server located in Israel.

Page 101: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

82 aPraCtiCalguidetoContentdeliverynetworks

gatewaythatfunnelstraffictoIsrael.Athop12,therouterdescriptionindicatesthatitisinIsrael.Thus,approximately70to80msofdelaycanbeattributedtotherouteractingasafunnelfortraffictoIsraelathop11,while another 110 to120msofdelay canbe attributedtothepropagationdelaybetweentheNewYorkgatewayandIsrael.OncethepacketsarriveinIsrael,thedelayfromhop12tohop17isapproximately10ms.Thus,theprimarydelaysarethegatewayfun-nelingtrafficfromtheEastCoasttoIsraelandpropagationdelay.

IfyourorganizationwaslocatedintheUnitedStatesandhadcus-tomersinIsrael,theresultsofthepreviouslydiscussedTracertwouldbereversed.Thatis,IsraeliuserswouldexperiencebottlenecksduetopropagationdelayandtrafficbeingfunneledthroughapeeringpointlocatedinNewYorkCity.WecanexpandthissituationtousersinWesternEurope,SouthAmerica,Japan,China,andotherlocationsaroundtheglobewho,whenaccessingserverslocatedintheUnitedStates,wouldalsoexperiencebottlenecksfromtrafficflowingthroughpeeringpointsaswellaspropagationdelays.Toalleviatethesedelays,your organization can move servers storing duplicate informationclosertotheultimateuser,whichisthekeyrationaleforcontentdeliv-erynetworking.However,becauseonlythelargestcompaniesmaybeabletoaffordplacingserversatdistributedlocationsaroundtheglobe,mostcontentdeliverymethodsdependupontheuseofathirdpartytoprovidethisservice.

Page 102: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

83

3underStandIng tcP/IP

The ability to understand technical details associated with contentdelivery requires an understanding of the TCP/IP protocol suite.In  this chapter we will briefly review the TCP/IP protocol suite,focusingourattentionuponthefieldsthatgoverntheidentificationofapplicationsandthedeliveryofIPdatagrams.

3.1   The TCP/IP Protocol Suite

TheTCP/IPprotocolsuitedatestotheworkoftheAdvancedResearchProjectsAgency(ARPA)duringthe1970sandearly1980s.Duringthattimeperiod,thequesttointerconnectcomputersresultedinthedevelopment of a series of protocols that evolved into the modernTCP/IPprotocolsuite.

3.1.1 Protocol Suite Components

Figure  3.1 illustrates the major components of the TCP/IP pro-tocol suite and their relationship to the International StandardsOrganization (ISO) Open System Interconnection (OSI) referencemodel. In examining Figure  3.1, note that the TCP/IP protocolsuitedoesnotspecifyaphysicallayer,nordoesitspecifyadata-linklayer.Instead,theprotocolsuiteusesitsAddressResolutionProtocol(ARP)asamechanismtoenabletheprotocolsuitetooperateaboveanydata-linklayerthatiscapableoftransportingandrespondingtoARPmessages.ThisenablestheTCP/IPprotocolsuitetointeroper-atewithEthernet,FastEthernet,GigabitEthernet,andToken-Ringlocalareanetworks.

InexaminingtherelationshipoftheTCP/IPprotocolsuitetotheOSI Reference Model shown in Figure  3.1, several additional itemswarrantmention.First,althoughapplicationsaretransportedatLayer 5intheprotocolsuite,theycorrespondtoLayers5,6,and7oftheOSI

Page 103: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

84 aPraCtiCalguidetoContentdeliverynetworks

ReferenceModel.Secondly,applicationsarecommonlycarriedbyoneoftwotransportprotocols,TCPorUDP.Aswewillnotelaterinthischapter,TCPprovidesaconnection-oriented,reliabletransportfacility,whileUDPprovidesabest-effort,nonreliabletransportfacility.

Applications transported by TCP and UDP are identified byDestination Port fields within each transport layer header. WhenTCP is used, the TCP header plus application data is referred toas a TCP  segment . In comparison, when UDP is employed as thetransport layer, theUDPheaderplus applicationdata is referred toas aUDP datagram .An IPdatagram is formedby theprefixof anIP headertoeitheraTCPsegmentorUDPdatagram.BecausetheIP  header contains Source and Destination address fields, routingoccursthroughtheexaminationofIPheaderfields.Differentappli-cationsareidentifiedviatheuseofportnumberswithintheTCPorUDPheader, making itpossible for a commondestination, suchasacorporateserver,toprovideWeb,e-mail,andfiletransfersupport.As weprobemoredeeplyintotheprotocolsuite,theuseofIPaddressesandTCPandUDPportnumberstodefineapplicationsdestinedtospecificdeviceswillbecomemoreclear.

3.1.2 Physical and Data-Link Layers

Asareview,thephysicallayerrepresentstheelectricalandmechani-calcomponentsnecessarytoconnecttothenetwork.Incomparison,

Figure 3.1 Major components of the TCP/IP protocol suite.

Page 104: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 85

the data-link layer uses a protocol to group information into datapacketsthatflowonthenetwork.BecausethedatalinkusesaLayer 2protocol, sourceanddestinationaddressesare indicated intermsofMediaAccessControl(MAC)addresses.

3.1.2.1 MAC Addressing MAC addresses are 48 bits or 6 bytes inlength,subdividedintoavendorcodeandidentifierthatcorrespondstothevendorcode.TheIEEEassignsvendorcodes,andthevendorormanufacturerburnsintoeachLANadapter’sReadOnlyMemory(ROM)aunique48-bitaddressusingtheassignedvendorcodebutvaryingtheidentifiernumberineachadapter.Ifthevendorissuccess-fulatmarketingitsLANadapters,itwillrequestadditionalvendorcodesfromtheIEEEandrepeatthepreviouslydescribedprocess.

AlthoughtheuseofMACaddressesensuresthattherewillbenoduplicateaddressesonaLAN,theabsenceofanynetworkidentifi-cationmadeitdifficulttointerconnect localareanetworks.Thatis,withoutaLANidentifier,itbecomesdifficulttonotethedestinationlocalareanetworkforaLayer2frame.Infact,themethodofrout-ingdatabetweenLANswasoriginallybasedonbridging,aLayer2technique inwhich the48-bitMACaddresseswereused todeter-mine if a frame should flow across a bridge. Because early bridgesinterconnected LANs located in close proximity to one another, itwasdifficulttointerconnectLANslocatedindifferentcitiesorevenatdifferentlocationswithinthesamecity.Toovercometheselimita-tions,networkprotocols,suchasTCP/IP,thatoperateatthenetworklayerincludetheabilitytoassignuniquenetworkaddressestoeachnetwork, making it possible to route data between networks baseduponadestinationnetworkaddresscontainedineachpacket.PriortodiscussingthenetworklayerintheTCP/IPprotocolsuite,weneedtocovertheuseofARP,whichfunctionsasa“bridge”betweenthenetwork layerand thedata-link layerand thusexplains its locationinFigure 3.1.However,priortodiscussingARP,afewwordsareinorderconcerningLayer3addressingintheTCP/IPprotocolsuite.

3.1.2.2 Layer 3 Addressing Today there are two versions of theInternetProtocol(IP)inuse,referredtoasIPv4andIPv6.IPv4uses32-bitaddressing,whileIPv6employstheuseof128-bitaddresses.Becauseapproximately99%oforganizationscurrentlyuseIPv4,we

Page 105: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

86 aPraCtiCalguidetoContentdeliverynetworks

will focusour attention in this sectionupon the32-bit addressingschemeusedbyIPv4.

UnderIPv4,therearefiveaddressclasses,referredtoasClassAthroughClassE.Thefirstthreeaddressclassesaresubdividedintonetworkandhostportions,asindicatedinFigure 3.2.

Class A addresses were assigned to very large organizations.If youexamineFigure 3.2, youwillnote thatonebyteof the four8-bitbytes in theaddress isused todenote thenetwork,while theremainingthreebytesinthe32-bitaddressareusedtoidentifythehostonthenetwork.Althoughan8-bitbytenormallyprovides256uniqueaddresses,underIPv4thefirstbitintheaddressfieldissettoabinary 1toidentifyaClassAaddress,reducingthenumberofuniquebitsintheaddresstoseven.Thus,therecanonlybeamaximumof27

or128ClassAaddresses.BecauseoneClassAaddressrepresentsaloopbackaddress,whiletheIPaddressof0.0.0.0isusedforthedefaultnetwork, this reduces thenumberofavailableClass Aaddresses to126 and explainswhy,many years ago, allClassA addresseswereassigned. Because three bytes of the Class A network address areusedtoidentifyeachhost,thismeansthateachClassAnetworkcansupport224–2,or16,777,214,distincthosts.Thereasonwesubtract 2fromthetotalreflectsthefactthatahostaddressofall0’sisusedtoidentifythenetwork(“thisnetwork”),whileahostaddressofall1’srepresentsthenetworkbroadcastaddress.

Returningourattention toFigure 3.2,wecannote thataClassBaddressextendsthenumberofbytesusedtoidentifythenetwork

Figure 3.2 IP address classes.

Page 106: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 87

totwo,resultingintwobytesbeingusedtoidentifythehostonthespecifiednetwork.Becausethefirsttwobitsinthe32-bitaddressareused to identify the address as a Class B address, this means that14 bitsareavailableto identify214(16,384)uniquenetworks.Thus,thereareconsiderablymoreClassBaddressesavailableforusethanClassAaddresses.BecauseeachClassBnetworkusestwobytestodefinethehostaddressonthenetwork,thismeansthereare216−2,or65,534,possiblehostsoneachClassBnetwork.ClassBaddressesweretypicallyissuedtolargeorganizations.

ThethirdIPv4addressthatissubdividedintonetworkhostpor-tionsistheClassCaddress.AClassCaddressusesthreebytestodefinethenetworkportionoftheaddress,whiletheremainingbytein the 32-bit address is used to identify the host on the network.A ClassCaddressalsousesthefirstthreebitsinthefirstbyteintheaddresstoidentifytheaddressasaClassCaddress.Thus,thereare221(2,097,152)uniqueClassCnetworkaddresses.

AClassCnetworkaddressisthemostpopularlyusedIPaddress,commonly issued to small to mid-sized organizations. However,becauseonlyonebyteisavailabletodefinethehostsonanetwork,aClassCaddresssupports the leastnumberofnetworkhosts.ThenumberofuniquehostsonaClassCnetwork is limitedto28−2,or254.The subtractionof two from theprior computation reflectsthefactthat,similartoClassAandClassBaddresses,twoClassCaddresseshavespecialmeaningsandarenotusedtoidentifyaspecifichost on anetwork.Those addresses are 0 and255.Ahost addressof0 isused to indicate “thisnetwork,”whileahostaddressof255representsthebroadcastaddressofthenetwork.Thus,theinabilitytousethosetwoaddressesashostaddressesresultsin254hostsbeingcapableofhavinguniqueaddressesonaClassCnetwork.

AlthoughClassA,B,andCare themostcommonlyusedIPv4addresses, two additional class addresses warrant a brief mention:ClassDandClassE.ClassDaddressesareusedformulticastopera-tions,whileClassEaddressesarereservedforexperimentation.

Now that we have an appreciation for the five types of IPv4addresses, let’s turnourattention to themannerbywhichnetworkaddresses are translated into MAC addresses. That translation, aswepreviouslynoted,isaccomplishedthroughtheuseoftheAddressResolutionProtocol(ARP).

Page 107: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

88 aPraCtiCalguidetoContentdeliverynetworks

3.1.2.3 ARP WhenarouterreceivesanIPv4packetaddressedtoaspecificnetworkandhostonthatnetwork,thedestinationaddressisspecifiedasa32-bitaddress.However,datadeliveryonthelocalareanetworkisbaseduponLayer2MACaddresses.Thismeansthatthe32-bitLayer3IPaddressreceivedbyaroutermustbetranslatedintoa48-bitMACaddressinorderforthepackettobedeliveredbytheLayer2protocol.

WhenarouterreceivesaLayer3packet, itfirstchecksitscachememory to determine if a previous address translation occurred.If so,itformsaLayer2frametodelivertheLayer3packet,usingthepreviouslylearnedLayer2MACaddressasthedestinationaddressintheframe.Ifnopreviousaddresstranslationoccurred,therouterwilluseARPasamechanismtodeterminetheLayer2addressasso-ciatedwiththeLayer3destinationaddress.Indoingso,therouterformsanARPpacket,indicatingtheIPaddressitneedstolearn.TheARPpacketistransportedasaLayer2broadcasttoallhostsonthenetwork.Thehost that is configuredwith the indicatedIPaddressrespondstothebroadcastwithitsMACaddress.Thus,therouterusesARPtolearntheMACaddressrequiredtodeliverLayer3addressedpacketstotheircorrectdestinationonaLayer2network,wheredatadeliveryoccursusingMACaddresses.Nowthatwehaveanappre-ciation for theuseofARP to enableLayer 3 addressedpackets tobedeliveredonLayer2networks,wecanturnourattentiontothehigherlayersoftheTCP/IPprotocolsuite.

3.1.3 The Network Layer

TheInternetProtocol(IP)representsanetworklayerprotocolthatenablesIPdatagramstoberoutedbetweensourceanddestinationnetworks.

Figure 3.3illustratestheformationofanIPdatagram,showingtherelationshipoftheIPheadertothetwotransportlayerheaderscom-monlyusedintheTCP/IPprotocolsuite:TCPandUDP.NotethattheapplicationdataisfirstprefixedwitheitheraTCPorUDPheaderpriortobeingprefixedwithanIPheader.

TheIPheaderconsistsof20bytesofinformationthatissubdividedintospecificfields.AnoptionexistswithintheIPheaderthatenablesthe header to be extended through the addition of optional bytes;however,thisextensionisrarelyemployed.

Page 108: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 89

3.1.3.1 IP Header ThetopportionofFigure 3.4 illustrates thefieldswithintheIPv4header.ThelowerportionofthatfigureprovidesabriefdescriptionofeachoftheIPv4headerfields.Forthepurposeofcontentdelivery,thekeyfieldsofinterestaretheTTLandProtocolfieldsaswellasthe32-bitsourceIPaddressand32-bitdestinationIPaddressfields.

3.1.3.1.1 TTL Field TheTimetoLive(TTL)fieldindicatesthenumberofhopstheIPpacketcantraversepriortobeingdiscarded.ThepurposeofthisfieldistoensurethatpacketsdonotcontinuouslyflowthroughtheInternetifthedestinationisnotlocated.Topreventendlesswandering,routersdecrementtheTTLfieldvalueand,iftheresult is zero, discard thepacket. In a contentdeliverynetworkingenvironment,themovementofWebserverdataclosertotherequestercommonlyensuresthatthedecrementoftheTTLfieldvalueneverreacheszero,whichwouldrequireapackettobesenttothegreatbitbucketinthesky.

3.1.3.1.2 Protocol Field The Protocol field is 8 bits in length andindicates the type of transport packet carried in the IP datagram.BecausetheProtocolfieldis8bitsinlength,upto256protocolscanbedefined.SomeofthemorepopularprotocolsaretheInternetControlMessage Protocol (decimal 1); the Transmission Control Protocol(TCP),which isdefinedbydecimal6 in the IPProtocolfield; andtheUserDatagramProtocol(UDP),whichisdefinedbydecimal17intheIPv4header’sProtocolfield.Thus,theIPv4Protocolfieldvaluedefinestheupper-layerprotocolusedtotransportdata.Laterinthischapter,wewillnotethatbothTCPandUDPheadersincludePortfieldswhosevaluesdefinetheapplicationcarriedatthetransportlayer.

Application data

Transport layer

Network layer

Application data

Application data

Application data

TCP/UDP

TCP/UDPIP

IP datagram

Figure 3.3 Forming an IP datagram.

Page 109: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

90 aPraCtiCalguidetoContentdeliverynetworks

3.1.3.1.3 Source and Destination Addresses Fields TheremainingtwofieldsintheIPv4headerofinterestwithrespecttocontentdeliveryaretheSourceandDestinationIPaddresses.Eachaddressis32bitsin length, with theSource address indicating theoriginator of thepacket,whiletheDestinationaddressindicatestheultimaterecipientofthepacket.

Asexplainedearlierinthischapter,underIPv4therearefiveaddressclasses,labeledAthroughE.ThevastmajorityofdatatrafficontheInternetoccursthroughtheuseofaddressclassesA,B,and C,while

Legend:

VER VERSIon nUMBER

IHL IP Header Length (number of 32-bit words in the header)ToS Type of Service byte, now known as the Differentiated Services Code Point

(DSCP)Size Size of the datagram in bytes (header plus data)Identification 16-bit number that, together with the source address, uniquely identifies

the packetFLAGS Used to control if a router can fragment a packetFragment offset A byte count from the start of the original packet set by the router that

performs fragmentationTTL Time to Live or the number of hops a packet can be routed overProtocol Indicates the type of packet carriedChecksum Used to detect errors in the headerSource address The IP address of the packet originatorDestination address The IP address of the final destination of the packet

Figure 3.4 The IPv4 header.

Page 110: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 91

address D is used for multicast transmission and address class  E isreservedforexperimentaloperations.Concerningaddressclasses A,B, and C, the Destination addresses for each are subdivided into anetworkportionandahostportion.Thus,theuseofthoseaddressesconveysboththenetworkwherethepacket isheadedaswellasthehostonthatnetwork.

NowthatwehaveageneralappreciationforthemannerbywhichSource and Destination IPv4 addresses are used to convey infor-mation, let’s literally move up the protocol suite and examine howTCPandUDPareabletodenotetheapplicationbeingtransported.In doingsowewillalsoexaminethekeydifferencesbetweeneachtransport-layerprotocol.

3.1.4 The Transport Layer

In the TCP/IP protocol suite, the transport layer is equivalent toLayer4intheISOReferenceModel.WhiletheProtocolfieldintheIPv4headerenablesupto256higher layerprotocolstobedefined,thetwotransportlayerprotocolscommonlyusedintheprotocolsuitearetheTransmissionControlProtocol(TCP)andtheUserDatagramProtocol(UDP).

3.1.4.1 TCP TheTransmissionControlProtocol (TCP)representsa reliable,connection-orientedprotocol. Itobtainsreliabilityduetothefactthattheprotocolincludesanerror-detectionand-correctioncapability. The protocol is connection-oriented, as it supports athree-wayhandshakingmethodunderwhichtherecipientmustmakeitspresenceknownpriortotheactualexchangeofdataoccurring.

Figure3.5illustratestheformatoftheTCPheader.Fromtheview-pointofcontentdelivery,theSourceandDestinationportsareofkeyconcern,sincetheirvaluesidentifytheapplicationbeingtransported.

BothSourceandDestinationPortfieldsare16bitsinlength.Priortodiscussingtheuseoftheseports,afewwordsareinorderconcern-ingtheotherfieldsintheTCPheader.Thus,let’squicklyreviewafewofthosefieldstoobtainanappreciationforwhyTCPisconsideredtorepresentareliable,connection-orientedLayer4protocol.

Page 111: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

92 aPraCtiCalguidetoContentdeliverynetworks

3.1.4.1.1 Sequence Number TheSequenceNumberfieldis32bitsinlength.Thisfield contains the sequencenumberof thefirst byte inthe TCP segment, unless the SYN bit (located in the flag field) isset. If  theSYNbit is set, thesequencenumberbecomesthe initialsequencenumber(ISN),andthefirstdatabyteisISN+1.

3.1.4.1.2 Acknowledgment Number Field The AcknowledgmentNumberfieldis32-bitsinlength.IftheACKcontrolbit(locatedintheFlagfield)isset,theAcknowledgmentNumberfieldcontainsthevalueofthenextsequencenumberthesenderoftheTCPsegmentisexpect-ing to receive. Once a connection is established, the next sequencenumberisalwayspresentintheAcknowledgmentNumberfield.Thus,theSequenceNumberandAcknowledgmentNumberfieldsnotonlyprovideamethodforensuringthecorrectorderofsegmentsatareceiver,butinadditionprovideamechanismtonoteifasegmentislost.

3.1.4.1.3 Window Field TheWindowfieldis16bitsinlength.Thisfieldcontainsthenumberofdatabytesbeginningwiththeoneindi-catedintheAcknowledgmentfieldthatthesenderofthesegmentiswillingtoaccept.Thus,youcanviewtheentryintheWindowfieldasaflow-controlmechanism,sinceasmallentryreducesthetransmis-sionofdatapersegment,whilealargerentryincreasestheamountofdatatransmittedpersegment.

3.1.4.1.4 Checksum Field The Checksum field is similar to theWindowfieldwithrespecttofieldlength,sinceitisalso16bits.TheChecksumfieldcontains the1’scomplementof the1’scomplement

Figure 3.5 The TCP header.

Page 112: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 93

sumofall16-bitwordsintheTCPheaderandtext.IfaTCPsegmentcontainsanoddnumberofbytes,anadditionalpaddedbyteofzerosisaddedtoforma16-bitwordforchecksumpurposes;however,thepaddedbyteisnottransmittedaspartoftheTCPsegment.

TheChecksumalsocoversa96-bitpseudoheaderthatisconcep-tually prefixed to the TCP header. The pseudo header consists ofthe Source and Destination Address fields, the Protocol field, andtheTCPlengthfield.ThepurposeoftheChecksumcoveringthosefields is to provide protection against misrouted segments, therebyenhancingthereliabilityofthistransportprotocol.

Now thatwehave ageneral appreciation forTCP let’s turnourattention to the second popular transport protocol in the TCP/IPprotocolsuite:theUserDatagramProtocol(UDP).

3.1.4.2 UDP The User Datagram Protocol (UDP) represents abest-effort,nonreliabletransportprotocol.UnlikeTCP,whichrequirestheestablishmentofaconnectionpriortothetransferofdata,whenusing UDP, data transfer occurs prior to knowing if a receiver ispresent.Thus,UDPrelieson theapplication todeterminewhether,afteraperiodofnoresponse,thesessionshouldterminate.

Figure 3.6illustratesthefieldsintheUDPheader.AlthoughboththeTCPandUDPheaders include16-bitSource andDestinationports, the UDP header is streamlined in comparison to the TCPheader.TheUDPheaderhasnoflow-control capability and, aswewillshortlynote,hasaverylimitederror-detectioncapability.

3.1.4.2.1 Length Field In examining the UDP header shown inFigure 3.6,theLengthfieldconsistsof16bitsthatindicatethelengthinbytesoftheUDPdatagramtoincludeitsheaderanddata.

3.1.4.2.2 Checksum Field TheChecksumisa16-bit1’scomplementof the1’scomplementsumofapseudoheaderof informationfrom

Figure 3.6 The UDP header.

Page 113: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

94 aPraCtiCalguidetoContentdeliverynetworks

theprefixedIPheader,theUDPheader,andthedata(paddedwithzeroedbytes,ifnecessary)toensurethatamultipleoftwobytesoccurs.SimilartotheTCPheaderChecksum,theUDPheaderChecksumprovides protection against misrouted datagrams. However, unlikeTCP,thereisnomethodwithinUDPforerrordetectionoftransmit-teddata,requiringtheapplicationtotakechargeofanyrequirederrordetectionandcorrectionoperation.

3.1.4.3 Port Meanings Inourprior examinationofTCPandUDPheaders,wenotedthatbothLayer4protocolshave16-bitSourceandDestinationfields.Becausethosefieldsfunctioninthesamemannerforeachprotocol,wewilldiscusstheiroperationasanentity.

3.1.4.3.1 Destination Port TheDestinationPortindicatesthetypeoflogicalconnectionprovidedbytheoriginatoroftheIPdatagram.Herethetermlogical connectionmorespecificallyreferstotheapplica-tionorservicetransportedbytheTCPsegment,whichisidentifiedbyaportnumberintheDestinationPortfield.

3.1.4.3.2 Source Port TheSourcePortisnormallysettoavalueofzerobytheoriginator.However,whenmeaningful,theassignmentofanonzerovalueindicatestheportofthesendingprocess,whichwillthenindicatetheporttowhichareplyshouldbeaddressed.Becausethevaluesofeachfieldareportnumbers,thisauthorwouldberemissifhedidnotdiscusstheirranges.

3.1.4.3.3 Port Numbers Ranges Each16-bitDestinationPort andSourcePortfieldiscapableoftransportinganumberfrom0through65,535, for a total of 65,536 unique port numbers. Port numbersare divided into three ranges, referred to as Well-Known Ports,Registered Ports, and Dynamic and/or Private Ports. Well-KnownPorts are those port numbers from 0 through 1,023, or the first1,024 portnumbers.RegisteredPortsare thoseportnumbers from1,024through49,151,whileDynamicand/orPrivatePortsarethoseportnumbersfrom49,152through65,535.

Well-KnownPortnumbersareassignedbytheInternetAssignedNumbersAuthority(IANA)andareusedbysystemprocessesorbyprograms to identify applications or services. Table  3.1 lists some

Page 114: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 95

of the more common Well-Known Port numbers. Although portnumbers listed inTable 3.1 are applicable tobothTCPandUDP,portnumbersarecommonlyusedwithonlyoneprotocol.

Forexample,FTP(FileTransferProtocol)istransportedasareli-able,connection-orientedprocessthatoccursthroughtheuseofTCP.In comparison, SNMP (Simple Network Management Protocol) istransported on a best-effort basis by UDP. However, some applica-tions, such as Voice over IP, use a combination of TCP and UDP.Forexample,whendialingatelephonenumber, thedialeddigitsaretransportedbyTCP,whichisareliableprotocol.However,onceacon-nectionisestablishedtothedialedparty,digitizedvoiceistransportedviaUDP.Thereasonforthelatterresultsfromthefactthatreal-timevoicecannotberetransmittedifabiterroroccurs.Thus,theapplicationwouldeitherdropanerroredpacketorignoretheerrorwhenitrecon-structsashortsegmentofvoicetransportedviaaUDPpacket.

Nowthatwehaveanappreciation for theoperationandutiliza-tionoftheTCP/IPTransportLayer,wewillconcludethischapterbyturningourattentiontotheDomainNameSystem(DNS).In doingso,wewillreviewhowtheDNSoperates,notonlytoobtainanappre-ciationofhownameresolutionoccurs,butalsotoobtaintheknowl-edgenecessarytoappreciatehowDNScanbeusedasamechanismtosupportloadbalancing,atopicwewilldiscussinmoredetaillaterinthisbook.

3.2   The Domain Name System

Whenyouenter aURL intoyourWebbrowseror sendane-mailmessage, you more than likely use a domain name. For example,the URL http://www.popcorn.com contains the domain name

Table 3.1 Common Well-Known Port numbers

PoRT nUMBER DESCRIPTIon

17 Quote of the Day20 File Transfer Protocol—Data21 File Transfer Protocol—Control23 Telnet25 Simple Mail Transfer Protocol43 Whois53 Domain name Server

Page 115: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

96 aPraCtiCalguidetoContentdeliverynetworks

popcorn.com. Similarly, the email address [email protected].

3.2.1 Need for Address Resolution

While domain names are easy to remember, they are not used byrouters, gateways, and computers for addressing. Instead, computa-tionaldevicesareconfiguredusingdotteddecimaldigitstorepresenttheirIPv4addresses.

Dotteddecimaladdressesarethenconvertedintobinaryequiva-lents, which represent the true addresses of computational devices.Although many books reference IPv4 addresses as being assignedto computational devices, in actuality those addresses are assignedtodevice interfaces.This explainshow routers andnetwork serverswithmultiplenetworkconnectionscanhavepacketstransmittedfromeachinterfacewithadistinctsourceaddressaswellasreceivepacketswithanexplicitdestinationaddressthatcorrespondstoaparticularinterface.BecauseIPv4ClassA,B,andCaddressesindicatebothanetworkandhostaddress,suchaddressesidentifybothanetworkforroutingpurposesaswellasaparticulardeviceonanetwork.

TheuseofIPv4addressingbycomputationaldevicesmeansthatatranslationdevicethatresolvesthedomainnameintoanIPaddressisrequiredtoproviderouterswiththeinformationnecessarytodeliverpackets to their intendeddestination.That translationor resolutionserviceisreferredtoastheDomainNameServiceandisthefocusofthissection.

3.2.2 Domain Name Servers

ComputersthatareusedtotranslatedomainnamestoIPaddressesarereferredtoasdomain name servers.Thereareaseriesofdomainname servers that maintain databases of IP addresses and domainnames,enablingadomainnametoberesolvedortranslatedintoanIPaddress.Somecompaniesoperateadomainnameserverontheirlocalareanetwork,whileotherorganizationsdependupontheDNSoperatedbytheirInternetServiceProvider.

IfabrowseruserentersaURLforwhichnopreviousIPv4addresswas found,thelocalDNSontheorganization’sLANwillquerytheISP’s

Page 116: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 97

DNStodetermineifaresolutionoccurredatahigherlevel.Similarly,iftheorganizationusestheservicesoftheISP’sDNSandaquerydoesnotresultinaresolution,thenthatDNSwillforwardthequerytoahigherauthority.Thehighestauthorityisreferredtoasthetop-leveldomain.

3.2.3 Top-Level Domain

Eachdomainnameconsistsofaseriesofcharacterstringsseparatedbydots(.).Theleftmoststringreferencesthehost,suchaswwworftp.Therightmoststringinthedomainnamereferencesthetop-leveldomain,suchasgovorcom.

WhentheInternetwasinitiallyestablished,therewereonlyahand-fuloftop-leveldomains.Thosetop-leveldomainsincluded.com(com-mercial),.edu(educational),.gov(government),.mil(U.S. DepartmentofDefense),.net(networks),and.org(organization).Sincethen,domainname registries have expanded considerably, as has the number oftop-leveldomainnameservers.Table 3.2listspresentlydefineddomainnameregistriesotherthanthosedefinedforcountries.Concerningthelatter,therearepresentlyover100two-letterdomainregistriesforcoun-tries, suchas .ar (Argentina), .il (Israel), and .uk (UnitedKingdom).TheIANAisresponsiblefordefiningdomainnamesuffixes.

Withineachtop-leveldomain,therecanbeliterallytensofthou-sandstomillionsofsecond-leveldomains.Forexample,inthe.comfirst-leveldomain,youhave

Table 3.2 Top-Level Domains

.aero Aviation

.biz Business organizations

.com Commercial

.coop Cooperative organizations

.edu Educational

.gov Government

.info Information

.int International organizations

.mil U.S. Department of Defense

.museum Museums

.name Personal

.net networks

.org organizations

Page 117: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

98 aPraCtiCalguidetoContentdeliverynetworks

MicrosoftGoogleYahoo

as well as millions of other entries. Although every .com top-leveldomainmustbeunique,therecanbeduplicationacrossdomains.Forexample, lexus.com and lexus.biz represent two different domains.By prefixing a word to the domain, such as ftp.popcorn.com orwww.popcorn.com,youobtainthenameofaspecifichostcomputerin a domain. That computer has an IP address that is determinedthroughtheuseoftheDomainNameService.

3.2.4 DNS Operation

WhenyouenteraURLintoyourbrowserintheformofadomainname,thatnamemustbeconvertedintoanIPaddress.ThataddresswillthenbeusedbythebrowsertorequestaWebpagefromthecom-puterwhoseinterfaceisassignedthataddress.Toobtainthataddress,thebrowsermustusethefacilitiesofadomainnameserver.Thus,thebrowsermustknowwheretolooktoaccessthenameserver.

3.2.5 Configuring Your Computer

Whenyou installyourcomputer’sTCP/IPsoftware,oneof thefirstfunctionsyouneedtoperformistoconfigureyournetworksettings.Whenyoudoso,youwill setyourcomputer’s IPaddress, its subnetmask,defaultgateway,andtheaddressofthenameserveryourcom-putershouldusewhenitneedstoconvertdomainnamestoIPaddresses.

Figure  3.7 illustrates the Microsoft Windows 2000 InternetProtocol (TCP/IP) Properties dialog box. Note that if you selectthebutton“UsethefollowingIPaddresses,”youareabletospecifythe IPaddress, subnetmask,default gateway, andup to twoDNSserver addresses. However, if your organization uses the DynamicHost Configuration Protocol (DHCP), you would then select thebutton labeled “Obtain an IP address automatically,” which wouldresult in the DNS addresses being transmitted to the host from aDHCPserveralongwith itsIPaddress, subnetmask,andgatewayaddresswhenthehostconnectstothenetwork.Ifyou’reworkingina

Page 118: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 99

Windowsenvironment,thereareseveraltoolsyoucanconsiderusingto obtain DNS and other addressing information. If you’re usinganolderversionofWindows, suchasWIN95orWIN98,youcanviewcurrentIPaddressassignmentsthroughtheuseofWinipfg.exe.If you’reusingWindows2000orWindowsXP,youcanuseIPConfigfromthecommandprompt.

ThetopportionofFigure 3.8illustratestheuseoftheIPConfigprogramwithoutanyoptions.Whenused inthismanner, thepro-gramreturnstheconnection-specificDNSsuffix,IPaddress,subnetmask, and default gateway address. Next, the IPConfig programwasexecutedasecondtime;however,thistimethe“all”optionwasincludedinthecommandline.Notethattheuseofthe“all”optionprovides additional information about the configuration of thecomputer,includingtheDHCPandDNSserveraddressesaswellasDHCPleasinginformation.

OnceacomputerknowstheIPaddressofitsdomainnameserver,itcanrequesttheservertoconvertadomainnameintoanIPaddress.IfthenameserverreceivedapriorrequesttoobtainanIPaddressforahostwith aparticulardomainname, such aswww.popcorn.com,

Figure 3.7 Using the Windows 2000 Internet Protocol (TCP/IP) Properties dialog box.

Page 119: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

100 aPraCtiCalguidetoContentdeliverynetworks

theservermerelyneedstoaccessitsmemorytoreturntheIPaddressassociatedwiththerequestpreviouslystoredincachememorytothecomputermakingtheresolutionrequest.Ifthenameserverdidnothavepriorknowledgeoftheresolution,itwouldtheninitiatecontactwithoneoftherootnameservers.

3.2.6 Root Name Servers

Currentlythereare13rootnameserversinexistence,withmostofthemlocatedintheUnitedStates,whileseveralserversarelocatedinJapanandLondon.Eachrootserverfunctionsinasimilarmanner,responding to a DNS query with the address of a name server forthetop-leveldomainforaparticularquery.Thatis,eachrootserverknowstheIPaddressforallofthenameserversthatsupportapar-ticular top-level domain. Thus, if your browser was pointed to theURL www.popcorn.com and the local name server had not previ-ously resolved the IP address for that host and domain address,

Figure 3.8 Using IPConfig to obtain information about the network settings associated with a computer.

Page 120: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 101

the local name server would contact one of the root name servers.Assumingtherootnameserverhadnotpreviouslyresolvedthehostanddomainname,therootserverwouldrespondwiththeIPaddressofthedomain,whichinourexampleisthenameserverforthe.comdomain,enablingyournameservertoaccessthatserver.

RootserversarelabeledAthroughM,witheachnameserverhavingafilethatcontainsinformationintheformofspecialrecordsthatcon-tainthenameandIPaddressofeachrootserver.EachrootserverinturnisconfiguredwiththeIPaddressesofthenameserversrespon-sibleforsupportingthevarioustop-leveldomains.Thus,aresolutionrequestthatcannotbeservicedbythelocaldomainserverispassedtoanapplicablerootserver,whichinturnreturnstheIPaddressofthetop-leveldomaintothelocalDNS.Thatnameserverthentrans-mitsaquerytothetop-levelnameserver,suchasaCOM,EDU,orGOVnameserver,requestingtheIPaddressforthenameserverforthedomaininwhichthehostthatrequiresaddressresolutionresides.Becausethetop-leveldomainnameserverhasentriesforalldomainservers for itsdomain, it respondswiththeIPaddressof thenameserver that handles the domain in question. The local name serverusesthatIPaddresstodirectlycontactthenameserverforthehostanddomainnameitneedstoresolve,suchaswww.popcorn.com.Thatnameserver returns theIPaddress to the localnameserver,whichthenreturnsittothebrowser.ThebrowserthenusesthatIPaddresstocontacttheserverforwww.popcorn.comtoretrieveaWebpage.

3.2.7 The NSLookup Tool

Ifyou’reusingaversionofMicrosoftWindowsthathastheTCP/IPprotocol suite installed, youcanuse theNSLookupdiagnostic tooltoobtaininformationfromdomainnameservers.NSLookupoper-ates in twomodes, referred toas interactive andnoninteractive.Thenoninteractive mode is normally used when you need to obtain asinglepieceofdata,whiletheinteractiveorprogrammodeprovidesyouwiththeabilitytoissueaseriesofqueries.

ThetopportionofFigure 3.9illustratestheuseofNSLookupinits noninteractive mode of operation. In this example, NSLookupwasusedtoobtaintheIPaddressoftheWebserverwww.eds.com.Inthesecondexample,NSLookupwasenteredbyitselftoplacethe

Page 121: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

102 aPraCtiCalguidetoContentdeliverynetworks

programintoitsinteractivemodeofoperation.TheprogramrespondsbyindicatingthenameserveranditsIPaddress,andthendisplayingthe “>”character asaprompt foruser input.At thispoint in time,youcanenteranNSLookupcommand.In the interactiveexample,the “set all” command was entered to obtain a list of set options.Notethatoneoptionistherootserver,withitscurrentvaluesettoA.ROOT-SERVERS.NET.ThisrepresentstherootserverthatthelocalDNSserverusesbydefault.Next,theservercommandwasusedtosettherootserverasthedefaultnameserver.NotethatthisactionreturnedtheIPaddressoftherootserver.

3.2.8 Expediting the Name Resolution Process

The name resolution process is expedited by name servers usingcaching.Whenanameserverresolvesarequest,itcachestheIPaddressassociatedwiththenameresolutionprocess.Thenexttimethenameserver receives a request for a previously resolved domain, it knowsthe IPaddress for thenameserverhandling thedomain.Thus, thenameserverdoesnothavetoquerytherootserver,sinceitpreviouslylearnedtherequiredinformationandcansimplyretrieveitfromcache

Figure 3.9 Using Microsoft’s nSLookup in noninteractive and interactive query mode.

Page 122: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 103

memory.Whilecachingcanbeveryeffective,ithastwolimitations.First,notallrequestsareduplicatesofpriorrequests.Secondly,cachememoryisfinite.Thus,whenanameserverreceivesanIPaddress,italsoreceivesaTimetoLive(TTL)valueassociatedwiththeaddress.ThenameserverwillcachetheaddressuntiltheTTLperiodexpires,afterwhichtheaddressispurgedtomakeroomfornewentries.

3.2.9 DNS Resource Records

The key to the operation of name servers is the resource records.Resourcerecordsdefinedatatypesinthedomainnamesystem.DNSrecordsarecodedinASCIIandtranslatedintoabinaryrepresenta-tionforinternalusebyaDNSapplication.

TheDNSsystemdefinesanumberofResourceRecords(RRs).ThetextrepresentationsofRRsarestoredinwhatarereferredtoaszone filesthatcanbeconsideredtorepresentthedomainnamedatabase.

3.2.9.1 SOA Resource Record AtthetopofeachzonefileisaStartofAuthority(SOA)record.Thisrecordidentifiesthezonename,ane-mailcontact,andvarioustimeandrefreshvaluesapplicabletothezone.ThetopportionofFigure 3.10illustratestheRFC1537–definedformatfortheSOArecord,whilethelowerportionofthatillustrationshowsanexampleoftheSOArecordforthefictionalpopcorn.comdomain.

SOA formatDoMAIn.nAME. In SoA Hostname.Domain.name. Mailbox.Domain.name. (

1 ; serno (serial number)86400 ; refresh in seconds (24 hours)7200 ; retry in seconds (2 hours)259200 ; expire in seconds (30 days)345600 ; TTL in seconds (4 days)

SOA record examplePoPCoRn.CoM In SoA PoPCoRn.CoM. gheld.popcorn.com. (

24601 ; serial number28800 ; refresh in 8 hours7200 ; retry in 2 hours259200 ; expire in 30 days86400 ; TTL is 1 day

Figure 3.10 The SoA record format and an example of its use.

Page 123: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

104 aPraCtiCalguidetoContentdeliverynetworks

InexaminingtheformatoftheSOArecordshowninFigure 3.10,thetrailingdot(.)afterthedomainnamesignifiesthatnosuffixistobeappendedtothename.TheclassoftheDNSrecordisshownasIN,whichstandsfor“Internet,”whileSOAindicatesthetypeofDNSrecord,whichisStartofAuthority.Themailboxisthatoftheindividual responsible for maintaining DNS for the domain. Theserialnumber(serno)indicatesthecurrentversionoftheDNSdata-baseforthedomainandprovidesthemechanismwherebyothernameservers can note that the database was updated. The serial numbercommencesat1andisincreasedby1eachtimethedatabasechanges.

The Refresh entry tells the secondary name server how often topolltheprimaryforchanges.TheRetryentrydefinestheintervalinsecondsatwhichthesecondarynameservertriestoreconnecttotheprimaryintheeventitfailedtoconnectattheRefreshinterval.

TheExpireentrydefineshowlongthesecondaryservershoulduseitscurrententryifitisunabletoperformarefresh,whiletheTTLvalueappliestoallrecordsintheDNSdatabaseonanameserver.

3.2.9.2 Name Server (NS) Records ThereisonlyoneSOArecordperdomain.However,becausetherecanbemultiplenameservers,therecan also be multiple NS records. Name servers use NS records tolocateoneanother,andtheremustbeatleasttwoNSrecordsineveryDNSentry.TheformatofanNSrecordisshownasfollows:

DoMAIn.nAME. In nS Hostname.Domain.name.

3.2.9.3 Address (A) records ThepurposeoftheAddress(A)recordistomapthehostnameofacomputertoitsnumericIPaddress.TheformatofanAddressrecordisindicatedasfollows:

Host.domain.name. In A www.xxx.yyy.zzz

3.2.9.4 Host Information (HINFO) Record A Host Information(HINFO)recordisoptional.Whenused,itcanbeemployedtoprovidehardwareoperating-systeminformationabouteachhost.TheformatofanHINFOrecordisshownasfollows:

Host.domain.name. In HInFo “cputype” “oS”

Page 124: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

understandingtCP/iP 105

3.2.9.5 Mail Exchange (MX) Records ThepurposeofaMailExchange(MX)recordistoallowmailforadomaintoberoutedtoaspecifichost. A host name can have one or more MX records, since largedomainswillhavebackupmailservers.TheformatofanMXrecordisshownasfollows:

Host.domain.name. In MX MM otherhost.domain.name.In MX MM otherhost2.domain.name.

Thepreferencenumbers(NN)signifytheorderinwhichmailerswillselect MX recordswhen attemptingmail delivery to thehost. Thelowerthenumber,thehigherthehostisinpriority.

To illustrate the use of the MX record, assume that your orga-nization’s mail server had the address mail.popcorn.com. Furtherassumethat,forconvenience,[email protected]@mail.popcorn.com.Toaccom-plishthis,yourMXrecordwouldbecodedasfollows:

popcorn.com. In MX 10 mail.popcorn.com.

3.2.9.6 Canonical Name (CNAME) Records The Canonical Name(CNAME) record enables a computer tobe referred toby an aliashostname.TheCNAMErecordformatisshownasfollows:

Alias.domain.name. In CnAME otherhost.domain.name.

It’simportanttonotethattheremustbeanArecordforthehostpriortoaddinganalias.ThehostnameintheArecordisknownasthecanonicalorofficialnameofthehost.

3.2.9.7 Other Records InadditiontothepreviouslymentionedDNSrecords,therearemanyotherresourcerecords.ThoserecordsrangeinscopefromPointer(PTR)recordsthatprovideanexactinverseofanArecord(allowingahosttoberecognizedbyitsIPaddress)toanA6resourcerecord,whichisusedforanIPv6addressforahost.AfulllistofDNSrecordtypescanbeobtainedfromIANADNSparameterlistings.ReadersareencouragedtouseasearchtoolsuchasBingorGoogletoresearchIANADNSparameters.

Page 125: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 126: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

107

4the cdn Model

Untilnow,wehaveonlybrieflycoveredthemajorreasonsforhavingacontentdeliverysystemandafewofthemethodsthatcouldfacilitatetheoperationofacontentdeliverynetwork(CDN).Inthischapterwewillprobemoredeeplyintothecontentdeliverymodel,examiningso-callededgeoperationsthatmovethecontentofcustomerstotheedgesoftheInternet.Inactuality,thetermedge,whileappropriate,canmeandifferent things todifferentpersonswithrespect totheirphysical location.However,prior todiscussingedgeoperations,wewillfirstbuilduponinformationpreviouslypresentedinthisbooktoobtainabetterappreciationfortherationaleforCDN.Indoingso,wewillviewtheInternetasa large transmission facility thathasaseriesofcriticallinks.Suchaviewwillallowustobetterunderstandbottlenecksandhowadistributedcontentdeliverynetworkcanover-comesuchbottlenecks.Becausethisauthorbelievesthatbothsidesofacoinneedtobeshown,aswediscussedgeoperationswewillalsoexaminesomeof the limitationsassociatedwith thedistributionofcontentacrosstheInternet.

4.1   Why Performance Matters

Earlierinthisbook,welookedattheinterconnectionofcommunica-tioncarriernetworksatpeeringpointsandhowthoselocationscouldadverselyaffecttheflowofdata.Inaddition,welookedattheflowoftrafficfromusersdistributedacrosstheglobeaccessingacommonWebserverandnotedthatsomeuserswouldhavetheirtrafficflowoveralargenumberofrouterhopstoreachtheserver.Sincerouterhopsandthecrossingoftrafficatpeeringpointscorrespondtotrans-missiondelays,asthedistancebetweentheuseroftheserverandtheserver increases, so toowill thedelays.Thosedelays can affect theabilityofpotentialcustomersofWebsitestointeractwiththesite,to

Page 127: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

108 aPraCtiCalguidetoContentdeliverynetworks

viewvideos,sende-mails,orperformotheroperations.Duetosuchfactors,we can state that there is literally aprice associatedwith adegradedlevelofperformance,assomeusersmaypointtheirbrowserselsewhere.Thus,theresultingeconomicsassociatedwithpoorperfor-manceisworthnoting.

4.1.1 Economics of Poor Performance

Tounderstandtheeconomicsassociatedwithpoorperformance,let’sassumethatyourorganization’sWebsitesellsdiscountairlineseats.Nowlet’sassumethatapotentialcustomeroftheWebsiteentershisorherdatesoftravelandtravellocationsinordertorequestadiscountprice.However,indoingso,thepotentialcustomermaythenwishtoalterthetraveldatesandchecktheratestooneormorealternatebutnearbylocations,sinceairlinesarenotoriousforhavingmanytravelrestrictionsaswellashavingratesbetweencitiesthatenableconsider-ablesavingsbyflyingeitherfromortoacloselylocatedairportinsteadoffromtheintendedpointoforiginationordestination.Becauseofthe previously described travel discrepancies, the typical potentialcustomermayneedtoenterdataonaseriesofWebpagesaswellasflipthroughaseriesofpageresponses.

Ifapotentialcustomeriscomparingpricesofseveraltravelsites,sheistypicallypressedfortime.Thus,potentialcustomerswhoexpe-riencepageaccessanddisplaydelaysassociatedwithrouterhopandpeeringpointsmayineffectbailoutfromfurtheraccessingtheWebsite forwhich they are experiencingdelays andproceed to anothersite. For example, suppose a person is viewing airline flights fromNewYorktoLosAngeles.ThatpersonmightstartherinvestigationofflightsbysearchingtheWebsiteforthecostofaneconomycoachticketonthedayshewishestodepartandreturn.Dependinguponthe availability and price of the ticket, she may modify the searchusingdifferenttraveldatesand/oradifferentcabin.Thus, therearemore than likely several searches a person would perform prior tomakingadecision.IftheWebsitebeingaccessedissluggish,thereisahighprobabilitythatthepersondoingthesearchingmightentertheURLofacompetitorintoherWebbrowser.

In the wonderful world of marketing, we are probably familiarwiththeadage,“Timeismoney.”Wecanequatethisadagetotheuse

Page 128: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 109

oftheInternetbynotingthateverybailoutofapotentialcustomerresults in the potential loss of revenue. However, when customersaredrivenawayduetowhat theyperceivetobeapoorlyperform-ingWeb site—butwhichinrealityarethedelaysresultingfromthepath by which data flows between the potential customer and theWeb site—alastingimpressionofpoorsiteperformancecanoccur.ThismeansthattheWebsiteoperatornotonlylosesthepossibilityofacurrentsalebut,inaddition,maylosetheopportunityforfuturesales, since thepotential customernowhas anegative viewof theWebsite.

4.1.2 Predictability

AnotherproblemthatpotentialcustomersencounterwhenaccessingaWebserveracrossmanyrouterhopsandpeeringpointsispredict-ability.AcentrallyhostedinfrastructureresultsinpotentialcustomersatlocationsfromAndorratoZambiaaccessingacommonlocation.Someroutesmayrequirethetraversalofalargenumberofrouterhopsandpeeringpoints,whileotherroutesmayrequirethetraversalofalessernumberof routerhopsandperhapsa feworevennopeeringpoints.Asyoucanimagine,dependinguponthegeographiclocationof thepotential customer andWeb serverbeing accessed,differentpotential customers can be expected to encounter different delays,makingtheaccessandretrievalofWebpagesanythingbutpredict-able.Delays,whileslightlyannoyingwhenaWebpageconsistsoftext,canrisetoaleveloffrustrationwhenimagesandvideocanbeselectedbytheviewer.Forexample,whenviewingavideothatistransmittedfromasingleserverviabothalongdistanceandthroughseveralpeer-ingpoints,thecumulativerandomdelayscanresultinahighlevelofbufferingoccurring.This,inturn,morethanlikelyresultsinahighlevelofviewerdissatisfaction,sinceeveryfewmomentsthevideowillpausewhilethebufferontheviewer’scomputerisrefilled.Then,asthebufferallowsafewsecondsof“smooth”videotobeviewed,therandomdelays result in thebufferbeingemptiedand thenrefilled,causingthevideotostopplayingandtypicallydisplayingtheword“buffering” at the bottom of the area in which the video is beingviewed.Inadditiontotheproblemsassociatedwiththedistanceandnumberofpeeringpointsbetweenviewerandserver,thegeographic

Page 129: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

110 aPraCtiCalguidetoContentdeliverynetworks

distributionofpotentialcustomersalsoneedstobeconsideredaswellasthetimeofdaywhenWebaccessoccurs.

Data flow on the Internet has peaks and valleys that partiallycorrespond to the typical workday. That is, for Monday throughFriday,activityincreasesfromarelativelylowlevelpriortothe8a.m.tonoonperiodasworkersarrive andperformboth job-relatedandpersonalactivitiesthatrequiresInternetaccess.Fromapproximatelynoonuntil2p.m.,activitytapersoffasworkersgotolunchandrunerrands.Intheafternoon,activitypeaksbetween4p.m.and5p.m.andthentapersoffaspeopleleavework.However,asworkersarriveathome,somepersonsbegintousetheInternettoperformavarietyofactivitiesthatmaynotbepossibleorarerestrictedatwork,rangingfromcheckingpersonale-mailtoretrievingstockmarketquotations.Thus, thereareseveralactivitypeaksduringthe lateafternoonandevening.Inaddition,onweekends,whenthereisminimalaccessofthe Internet fromwork, tensofmillionsofpersons located aroundtheglobeaccesstheInternetfromhomeorfromlibrariesandcollegesanduniversities,creatingvariabletrafficpeaksandvalleysthrough-outtheweekend.Tofurthercompoundachallengingsituation,thedistributionofactivityvariesbytimezone,with,forexample,usersinLos AngelesandNewYorkwhoaccessaserverlocatedinChicagohavingadistributionofactivitythatistwohoursbehind(LosAngeles)andonehourahead(NewYork)ofChicagousers.

Whentrafficloadsareconsideredalongwiththegeographicloca-tion of potential customers, it becomes apparent that the potentialcustomersofacentrallylocatedcomputerinfrastructureconsistingofaWebserverandback-enddatabaseserverswillhavedifferentexpe-rienceswithrespect toWeb-serveraccessandpage-retrievalopera-tions each time theypoint their browsers to aparticularWeb site.Onceagain, thisunpredictability can result in the lossofpotentialcustomerswhodecidetoperformtheirsearchforproductstheydesireonotherWebsites,resultinginanadditionaleffectuponthebottomlineoftheWebsiteoperator.

4.1.3 Customer Loyalty

During the initial Internet boom from 1997 through 2000, manymarketresearchorganizationsviewedthepopularityofsitesonlywith

Page 130: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 111

respecttopageclicks.Intheeuphoriaofthatperiod,thefactthatfewclickswereconvertedintopurchaseswasirrelevant.Fortunately,theburstoftheso-calledInternetbubbleresultedinareturntorationalmarketresearchwherethebottomlinetrulymatters.

Whenpotentialcustomerscannotpredictablyaccessanorganiza-tion’sWeb site, theynormallymake a rational decision to go else-where.ThisdecisioncanoccurviatheentryofanewURLtotheuseofasearchenginetolocateanothersiteprovidingasimilarproduct.Overtime,potentialcustomersmayevenremovethecurrentsitefromtheirbrowser’sFavoriteslist.Regardlessoftheactionperformed,theresultwill be similar in that, insteadof acquiring a loyal customerwhocouldbethesourceofrepeatedbusiness,thepotentialcustomeris,ineffect,drivenaway.

4.1.4 Scalability

AnotherproblemassociatedwiththecentralizedWebsitemodelisscalability.Thecentralizedmodel requires theWebsiteoperator toaddmoreequipment toone location to satisfyaccess increases thatcanoccurfromlocationsscatteredovertheglobe.

In a distributed model where content delivery is moved to theliteraledgesoftheInternet,anincreaseinWebaccessisdistributedovermanyservers.Asaresultofthisaction,itmaynotbenecessarytoupgradeanyWebserver.Inaddition,iftheorganizationisusingthefacilitiesofacontentdeliverynetworkprovider,theresponsibili-tiesassociatedwithcomputerupgradesbecometheresponsibilityoftheserviceprovider.Thismeansthatifyourorganizationentersintoacontractwithaserviceproviderthatincludesadetailedservice-levelagreement, the service provider will upgrade its equipment at oneor more locations when service becomes an issue. This upgrade, atmost,shouldonlytemporarilyaffectoneofmanylocationsandthusshouldnotbecomparedtoacentralsiteupgradethatcanaffectallcustomersonaglobalbasis.Thus,thescalabilityissuealsoincludesthe effect uponpotential customers,with a centralized model pro-viding ahigh likelihoodof a complete outageoccurringduring anequipmentupgradeprocess,whileadistributedmodelupgradeonlyaffectspotential customerswhose trafficflows through aparticularedgeserver.

Page 131: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

112 aPraCtiCalguidetoContentdeliverynetworks

4.1.5 Flexibility

Asindicatedinourpriordiscussionconcerningscalability,thecen-tralized approach can result in a complete outage during a hard-wareupgrade.Thus,adistributedapproach,wherecontentismovedforward to edge servers, provides more flexibility with respect tohardware upgrades. First, moving content onto many servers canforgotheneedforacentralizedhardwareupgrade.Evenifacentral-izedserverisnecessarytocoordinatethedistributionofdatatoedgeservers,theupgradeofthecentralservercanbeplannedtominimizeitseffectonthedistributionofinformationtotheedgeservers.Thus,if a hardware upgrade becomes necessary and you plan correctly,it onlyatmostaffectsoneofmanyedgeserversatatimeandallowsyourorganizationtobetterplanforpartialoutages.Thus,theuseofacontentdeliverynetworkcanprovideamoreflexiblesolutiontoyourWebsitedataaccessrequirements.

4.1.6 Company Perception

Thereisthewellknownadage,“Youarewhatyoueat.”Inthewon-derfulworldoftheInternet,theperformanceofyourcompany’sWebsite canhavea significant influenceuponhowactual andpotentialcustomers view your organization and your organization’s brandperception. If users attempting to access your corporate Web siteencountersignificantdelays,notonlywillthisresultinanumberofuserspointingtheirbrowserselsewhere,but,inaddition,itwillresultinanegativeviewofyourorganization.Fromapersonalperspective,thereareseveralWebsitesthatIprefertoavoidduringtheholidayseason,asthespecificvendors(whomIprefernottoname)haveforseveral yearsapparently failed toupgrade their sites.Consequently,efforts to purchase an item becomes a major waste of time as cus-tomersstareattheirInternetbrowserswaitingforaresponsetotheirrequests.Unfortunately,suchvendorsfailtoappreciatethattheheavyinvestmentincommercialadvertisingcancometonaughtifcustomersandpotentialcustomersdecidetoabandontheirorganization’sWebsite.Thus, theperformanceof yourorganization’sWeb site inpre-cludingaccessdelayscanhaveadirectimpactuponbrandperceptionandcustomerloyalty.

Page 132: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 113

4.1.7 Summary

Baseduponinformationpresentedinthissection,it’sobviousthatWebperformancematters.Becausecustomersandpotentialcustomerscon-sideraccessdelaystorepresentWebserverperformance,thecentral-izedservermodelhasmanylimitations.Aswepreviouslynoted,thoselimitationsincludealackofpredictability,thepotentiallossofcustomerloyalty,difficultyinscalingacentralizedservertoaccommodatetrafficgrowth,animpairmentoforganizationalflexibility,andthepossibilitythattheoperationofacentralizedsitewillprovidedelaysthatnegativelyimpacttheperceptionofyourorganization.Becausetheselimitationscanresultinthelossofcustomerrevenue,thereisaneconomicpenaltyassociated with them. That economic penalty can vary considerably,basedupontheseverityofoneormoreofthenotedlimitationsaswellasthetypeofmerchandiseorservicesoldbytheWebsite.

Priortoexamininghowmovingcontenttoedgeserverscanreducelatencyandenhanceserveraccess,let’sobtainamoredetailedviewofthefactorsassociatedwithWebserveraccessdelays.Indoingso,weturnourattentiontoexaminingInternetbottlenecksinanefforttoobtainabetterunderstandingbehindtherationaleformovingcontenttowardactualandpotentialusers.

4.2   Examining Internet Bottlenecks

Previouslyinthisbook,wenotedthatthedistancebetweentheuserandaWebsite in termsof thenumberof routerhopsandpeeringpointssignificantlycontributes tositeaccessdelays.Inthissection,we will probe more deeply into Internet bottlenecks and examinedataflowfromsourcetodestination,notingtheeffectofaseriesofpotentialandactualbottlenecksuponWebserveraccess.

4.2.1 Entry and Egress Considerations

TwooftheoftenoverlookedInternetbottlenecksaretheentryandegresstransportfacilitiesusedbyacustomerorpotentialcustomertoaccessaparticularWebserver.

The entry transport facility refers to the type of access the useremploystologontotheInternetandtheactivityoverthattransport

Page 133: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

114 aPraCtiCalguidetoContentdeliverynetworks

facility.Whiletheegresstransportfacilityreferstothecommunica-tionslinkfromtheInternettoaparticularWebserver,italsorepre-sentsareverseconnection.Thatis,theegresstransportfacilitywithrespecttoauserbecomestheaccesstransportfacilityoftheserver.Similarly,theentrytransportfacilityoftheusercanbeviewedastheegresstransportfacilityoftheserver.Toeliminatepossibleconfusion,wecannotethatthetypicalbrowserusersendsrequestsintheformofURLstoaWebsitetoretrieveaWebpage.TheWebserverrespondswith a Web page whose size in bytes is normally several orders ofmagnitudegreaterthanthepacketsentbythebrowseruserthatcon-tains a URL request. Thus, instead of focusing our attention uponthetransmissionoftheURLtotheserver,wecanfocusourattentionupon theWeb server’s response. Indoing so,we cannote that theuser’s accessmethod to the Internet results in adelay in thedeliv-eryofaserverpagebasedupontheoperatingrateoftheaccessline.Similarly,theuser’segresstransportfacilitycanbeviewedasadelaymechanismwithrespecttotheserverdeliveringaWebpagetotheInternet.NowthatwehaveanappreciationforthemannerbywhichwecanfocusourattentionuponWebpagedelayswithrespecttothebrowseruser’sviewofaccessandegress,let’stakeacloserlook.

4.2.2 Access Delays

Previously, we noted that the access line connecting a browser usertotheInternetrepresentstheegressdelayassociatedwithdeliveringaWebpagetotheuser.BecausethereareseveraltypesoftransportfacilitiesabrowserusercanemploytoaccesstheInternet,weneedtoconsidereachmethodwhencomputingtheeffectoftheaccesstrans-portfacilityuponWebpageegressdelays.Forexample,ausermightaccess theInternetviadial-upconnectionusingthepublicswitchedtelephone network at 56 kilobits per second (Kbps), over a DSLmodemconnectionat1.5megabitspersecond(Mbps),acablemodemconnectionoperatingat6Mbps,oracorporateT1connectionoperat-ingat1.544 Mbps.WhilethefirsttwoconnectionmethodsprovidededicatedaccesstotheInternet,thecablemodemandcorporateT1connection both represent a shared access method, with achievablethroughputbaseduponthenumberofusersaccessingtheInternetandtheir activity. For example, the cable modem access to the Internet

Page 134: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 115

occurs via a shared Ethernet LAN (local area network); thus thenumberofusersreceivingdatawhenauserrequestsaWebpagewillgoverntheoverallresponse.Similarly,thenumberofcorporateuserscommunicatingonaT1transmissionfacilitywillgoverntheresponsethey receive when requesting a Web page. We can obtain an aver-agethroughputperuserfromtheuseofelementarymathematics.Forexample,assumingeachcorporateuserisperformingasimilaractivityand10usersareaccessingtheInternet,thentheaveragethroughputofeachuserbecomes1.544Mbps/10or154,400bps.

Inactuality,aT1linethatoperatesat1.544Mbpsuses8,000bitspersecondforframing.Thus,theactualdatarateavailabletotransportdataoveraT1connectionbecomes1.544Mbpsminus8,000bpsor1.536 Mbps.FormostorganizationsestimatingT1performance,thevariabilityofanestimatewithinamarginoferrorof10%to20%allowscomputationstooccurusingadatarateof1.544Mbps.However,asnoted,abettermeasurementoccurswhenthedatatransportcapacityof1.536MbpsisusedforaT1line.

As previously discussed, most Internet entry actions consist oftransmittingashortURLtoaccessaserverpage.Thus,throughputdelays associated with requesting a Web page do not significantlyvaryamongthepreviouslymentionedaccessmethods.However,theopposite isnot true.That is, there canbe significantdifferences inWeb pagedisplaydelaysbaseduponthemethodauseremploys toaccesstheInternet.Forexample,considerTable 4.1,whichshowsthedelayorlatencyassociatedwithdeliveringaWebpagevaryinginsizefrom10,000bytes to300,000bytes in increments of 10,000bytesbaseduponfourdatarates.

InexaminingtheentriesinTable 4.1,let’sstartwiththeleftmostcolumn,whichshowstheWebpagesize.MostWebpagescontainamixtureoftextandgraphics,withthelatterprimarilyintheJPEGformatthatpermitsahighdegreeofimagecompression.Evenso,it’scommonforatypicalWebpagetoconsistofbetween150,000and175,000bytes.OnenotableexceptiontothisaverageWebpagesizeis theGooglehomepage,which isshowninFigure 4.1.NotethattheGooglehomepageisstreamlined,withonlyonegraphicimageonthepage.Thisactionfacilitatesthedeliveryofthathomepagetousers regardlessof thedata transportmechanism they areusing toaccesstheInternet.

Page 135: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

116 aPraCtiCalguidetoContentdeliverynetworks

AttheoppositeendofWebpagedesignwithrespect tographicimagesarethehomepagesofthemajortelevisionnetworks,suchasABC.com,CBS.com,FOX.com, andNBC.com, aswell as portalssuch as Yahoo.com and MSNBC.com. In addition, many news-papers’on-lineWebsites,suchasNYTimes.comandJPost.com,arepackedwithalargenumberofsmallgraphicimagesthatcumulatively

Table 4.1 Web Page Delays Based upon Page Size and the Speed of the Access Line Connection

WEB PAGE SIzE (BYTES)

WEB PAGE DELAY (S)

56,000-BPS DATA-RATE

DELAY

150,000-BPS DATA-RATE

DELAY

1.554-MBPS DATA-RATE

DELAY

6-MBPS DATA-RATE

DELAY

10,000 1.42857 0.53333 0.05208 0.0133320,000 2.85714 1.06667 0.10417 0.0266730,000 4.28571 1.60000 0.15625 0.0400040,000 5.71429 2.13333 0.20833 0.0533350,000 7.14286 2.66667 0.26042 0.0666760,000 8.57143 3.20000 0.31250 0.0800070,000 10.00000 3.73333 0.36458 0.0933380,000 11.42857 4.26667 0.41667 0.1066790,000 12.85714 4.80000 0.46875 0.12000

100,000 14.28571 5.33333 0.52083 0.13333110,000 15.71429 5.86667 0.57292 0.14667120,000 17.14286 6.40000 0.62500 0.16000130,000 18.57143 6.93333 0.67708 0.17333140,000 20.00000 7.46667 0.72917 0.18667150,000 21.42857 8.00000 0.78125 0.20000160,000 22.85714 8.53333 0.83333 0.21333170,000 24.28571 9.06667 0.88542 0.22667180,000 25.71429 9.60000 0.93750 0.24000190,000 27.14286 10.13333 0.98958 0.25333200,000 28.57143 10.66667 1.04167 0.26667210,000 30.00000 11.20000 1.09375 0.28000220,000 31.42857 11.73333 1.14583 0.29333230,000 32.85714 12.26667 1.19792 0.30667240,000 34.28571 12.80000 1.25000 0.32000250,000 35.71429 13.33333 1.30208 0.33333260,000 37.14286 13.86667 1.35417 0.34667270,000 38.57143 14.40000 1.40625 0.36000280,000 40.00000 14.93333 1.45833 0.37333290,000 41.42857 15.46667 1.51042 0.38667300,000 42.85714 16.00000 1.56250 0.40000

Page 136: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 117

resultsinaWebpagethatcaneasilyexceed150,000bytesofdata.Forexample,considerFigure 4.2,whichshowsthehomepageforYahoo.comviewedonJanuary5,2010,onawide-screenHPTouchSmartall-in-onecomputer.IfyoufocusyourattentionsonFigure 4.2,youwillnoteaseriesofsmallimagesoriconsunderthecolumnlabeled“MyFavorites.”Ifyouexaminethefoursmallpicturesunderthemainpicture,theyfunctionasaselectorforthestoryandimagesthatwillbedisplayedasyoumoveyourcursorovereach.Becausetheimagesandstoriesaredownloaded,theyaddtothesizeoftheWebpageandcumulativelyadduptoproducearatherlargedownloadthatwouldbepoorlyhandledbyadataratelessthanabroadbandspeed.

Youcanobtainanappreciationforthesizeofgraphic imagesbymovingyourcursoroveranimageandperformingarightclickopera-tion.Fromtheresultingpop-upmenu,select“Properties,”whichwilldisplay the size of the image in bytes as well as other informationabouttheimage.

Figure 4.3illustratesanexampleofthedisplayofthePropertiesboxassociatedwiththemainimageofCaseyJohnsonlocatedtotheright

Figure 4.1 The Google home page is optimized for delivery to a Web browser user.

Page 137: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

118 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 4

.2

The

hom

e pa

ge o

f Yah

oo.c

om c

onta

ins

grap

hics

sca

ttere

d on

the

page

that

cum

ulat

ivel

y res

ult i

n a

larg

e do

wnlo

ad o

f byt

es o

f dat

a.

Page 138: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 119

Figu

re 4

.3

View

ing

the

Prop

ertie

s bo

x ass

ocia

ted

with

one

of f

our i

mag

es d

ownl

oade

d wh

en a

cces

sing

the

hom

e pa

ge o

f Yah

oo.c

om.

Page 139: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

120 aPraCtiCalguidetoContentdeliverynetworks

ofthebox.Notethatthisimage,whichisoneoffourimagesonthetypicalhomepageofYahoo.comforanydayoftheweek,consistsofalmost16,000bytes.Thus,whenyouaddupthenumberofbytesfortheseriesofsmallimagesandothergraphicsonthetypicalhomepageofYahoo.com,theamountofdataeasilyapproaches150,000bytes.

Returning our attention to Table  4.1, let’s focus our attentionuponcolumns2through5.Thosecolumnsindicatethedelayinsec-ondsforthereceptionofaWebpageofindicatedsizeinbytesbaseduponthetransportfacilityemployedbythebrowseruser.IfwefocusourattentiononthecolumnsassociatedwiththeWebpagesizeof150,000 bytesthatrepresentsanaverageWebpagesize,wecannotethat the page exit delay or latency will range from approximately21 secondswhen theuseremploysa56-Kbpsdial-upmodemtoafifthofa secondwhenusingacablemodemoperatingat6Mbps.Similarly,ataWebpagesizeof200,000bytes,theWebpagedelaysrangefromapproximately28.6secondsat56Kbpsto0.266 secondsat a cable modem operating rate of 6 Mbps. Thus, as you mightexpect,WebpagedelaysassociatedwithexitingtheInternetviathebrowseruser’saccesslineincreaseastheWebpagesizeincreases.Inaddition,thedelayisproportionaltotheoperatingrateoftheaccessline.Thus,boththepagesizeinbytesreturnedtotheuseraswellastheInternetconnectionmethodplayanimportantroleintheover-alldelay.ThisinformationalsoindicateswhymostWebpagesthatincludeavarietyofimagesarenotsuitableforviewingwithadial-upmodem. Although it’s possible to avoid downloading images andreplacethemwithtextlabelstospeedupadownload,theoldadagethatapictureisworthathousandwordsappliesinthemodernWebenvironmentandreinforcestheimportanceofhavingthecapabilitytorapidlydownloadWebpages.

NowthatwehaveanappreciationforthedelaysattributabletothetypeofaccesstotheInternetemployedbybrowserusers,let’sturnourattentiontotheegressdelays.Inactuality,wearereferringtothecon-nectionfromtheInternettoaparticularWebserver.Aspreviouslymen-tioned,sinceaWebpageisseveralordersofmagnitudelargerintermsofbytesthanaURLrequest,wecanignorethedelayassociatedwiththepacketcontainingtheURLflowingtotheWebserver.However,it’simportanttonotethataserverhasafixedmaximumcapabilitywithrespecttothenumberofsimultaneoususersitcanservice.Inactuality,

Page 140: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 121

thisauthorisreferringtothemaximumnumberofopenTCP/IPcon-nectionsaservercansupport,whichisusually limitedtofewerthan2,000.Becausetheconnectionsareopen,weusuallyrefertothemasthenumberofsimultaneousopenconnections.Oneofthemostcom-mon hacker methods to disrupt service is to transmit a sequence ofconnectionrequestswhilecyclingthrougha listofIPaddresses thatareusedassourceaddressesineachTCP/IPconnection.Becausetheserverrespondstoeachrequestandsetsatimersothattheconnectionwilleventuallytimeout,byloadingtheserverwithphonyconnectionrequests,thehackermakesitdifficult,ifnotimpossible,forlegitimateuserstoaccesstheserver.Because,forpracticalpurposes,wecanignorethedelayassociatedwithegressoperationsconsistingofshortURLstransportedtoaserver,wecancontinuetofocusourattentionontheflowofWebpagestowardtheuser.

4.2.3 Egress Delays

Previously,wenotedthatauser’sbrowserconnectiontotheInternethasabearingonlatency.Similarly,sodoestheconnectionofaWebsitetotheInternet.Becauseinputdelaysarebaseduponthesizeofthedatapacket,thesmallamountofdataintheformofaURLrequestmeansthatwecanignoreinputdelayswithoutbeingsignificantlyinerror.Thus,similartoourinvestigationofuseraccessdelays,thepri-maryegresstransportfacilitydelayinvolvestheflowofaWebpageover the transport facility thatconnectsaWebsite to theInternet.Thisallowsustosimplifyouranalysiswhilebeingmorecorrectthanmostbureaucrats.

MostWebsitesareconnectedtotheInternetbyT1orT3 lines,withthelatteroperatingatapproximately45Mbps.However,unlikeourpriorcomputationsfortheInternetaccesslinethatisnormallynotshared,theegressconnectionissharedbymanyusers.Thus,weneedtoconsidertheaveragenumberofusersaccessingaWebserver,aseachuserrequestresultsinthereturnofaWebpage.Whileit’struethatdifferentuserswillberequestingdifferentWebpagesthatareformedfromdifferenttextandgraphicimages,forsimplicitywecanconsiderthateachWebpagehasasimilarcompositioninbytes.In effect,wearelookingforanaverageWebpagesize,withsomeusersrequestingalargerpagesizewhileotherusersarerequestingasmallerWebpage

Page 141: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

122 aPraCtiCalguidetoContentdeliverynetworks

size.Again,whenwediscuss the sizeof aWebpage,weare refer-ringtothenumberofbytesofdatacontainedonthepageandnotitslength.Thus,wecanmodifyourpreviouscomputationsperformedinTable 4.1toreflecttheactivityofadditionalusersataWebsite.

Table 4.2providesasummaryoftheuseofanExcelspreadsheetmodel toproject the timedelays in secondsassociatedwithaWebpagedelivery.SimilartoTable 4.1,thefirstcolumnindicatesvarying

Table 4.2 Time Delays in Seconds for Delivering a Web Page from a Server to the Internet

WEB PAGE SIzE (BYTES)

150,000-BPS DATA RATE W45-MBPS DATA RATE

1-USER DELAY

10-USER DELAY

100-USER DELAY

1-USER DELAY

10-USER DELAY

100-USER DELAY

10,000 0.05333 0.53333 5.33333 0.00178 0.01778 0.1777820,000 0.10667 1.06667 10.66667 0.00356 0.03556 0.3555630,000 0.16000 1.60000 16.00000 0.00533 0.05333 0.5333340,000 0.21333 2.13333 21.33333 0.00711 0.07111 0.7111150,000 0.26667 2.66667 26.66667 0.00889 0.08889 0.8888960,000 0.32000 3.20000 32.00000 0.01067 0.10667 1.0666770,000 0.37333 3.73333 37.33333 0.01244 0.12444 1.2444480,000 0.42667 4.26667 42.66667 0.01422 0.14222 1.4222290,000 0.48000 4.80000 48.00000 0.01600 0.16000 1.60000

100,000 0.53333 5.33333 53.33333 0.01778 0.17778 1.77778110,000 0.58667 5.86667 58.66667 0.01956 0.19556 1.95556120,000 0.64000 6.40000 64.00000 0.02133 0.21333 2.13333130,000 0.69333 6.93333 69.33333 0.02311 0.23111 2.31111140,000 0.74667 7.46667 74.66667 0.02489 0.24889 2.48889150,000 0.80000 8.00000 80.00000 0.02667 0.26667 2.66667160,000 0.85333 8.53333 85.33333 0.02844 0.28444 2.84444170,000 0.90667 9.06667 90.66667 0.03022 0.30222 3.02222180,000 0.96000 9.60000 96.00000 0.03200 0.32000 3.20000190,000 1.01333 10.13333 101.33333 0.03378 0.33778 3.37778200,000 1.06667 10.66667 106.66667 0.03556 0.35556 3.55556210,000 1.12000 11.20000 112.00000 0.03733 0.37333 3.73333220,000 1.17333 11.73333 117.33333 0.03911 0.39111 3.91111230,000 1.22667 12.26667 122.66667 0.04089 0.40889 4.08889240,000 1.28000 12.80000 128.00000 0.04267 0.42667 4.26667250,000 1.33333 13.33333 133.33333 0.04444 0.44444 4.44444260,000 1.38667 13.86667 138.66667 0.04622 0.46222 4.62222270,000 1.44000 14.40000 144.00000 0.04800 0.48000 4.80000280,000 1.49333 14.93333 149.33333 0.04978 0.49778 4.97778290,000 1.54667 15.46667 154.66667 0.05156 0.51556 5.15556300,000 1.60000 16.00000 160.00000 0.05333 0.53333 5.33333

Page 142: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 123

Webpagesizesinbytes,rangingfrom10,000bytesto300,000bytesinincrementsof10,000bytes.Columns2,3,and4indicatethedelaysassociatedwith1,10,and100usersqueryingaserverandsharinga1.5-MbpsT1transportfacilityconnectingtheservertotheInternet.Similarly,columns5,6,and7indicatethedelaysassociatedwith1,10,and100usersaccessingacommonWebserverandsharingaT3transportfacilityoperatingatapproximately45MbpsthatconnectstheservertotheInternet.

InexaminingtheentriesinTable 4.2,notethatthedelayindeliv-eringaWebpagefromaserverontotheInternetisprimarilyafunc-tionofthreefactors.ThosefactorsincludetheWebpagesizeinbytes,thetransportfacilityoperatingrate,andthenumberofusersrequest-ingthedeliveryofWebpagesoverthecommonconnectionbetweentheserverandtheInternet.

AcarefulexaminationofthedatapresentedinTable 4.2indicatesthatapopularWebsitethathasaT1connectiontotheInternetcanencounter significantWebpagedeliverydelayswheneven10usersareactivelyrequestingWebpages.Forexample,ataWebpagesizeof150,000bytes,thedelayinplacingoneWebpageontotheInternetis8seconds,whichinthemodernworldofInternetusagemayappeartobeaneternitytomanypersons.WhenthenumberofusersrequestingWebpagesincreasesto100,thedelayincreasesto80seconds,whichisobviouslysuchasignificantamountoftimethatmostpersonsaccess-ingtheWebsitemorethanlikelybelievethatthesystemisdown,andtheyhaveeitherpointedtheirbrowserselsewhereorbegunasearchforanothersitethatwillsatisfytheirrequirements.

To reduce delay times, many organizations have upgraded theirfacilitiesbyinstallingT3connectionstotheInternetthatoperateatapproximately45Mbps.IfyouexaminetherowinTable 4.2associ-atedwithaWebpagesizeof150,000bytesandmovetotherightmostcolumn,youwillnotethattheuseofaT3transmissionfacilitywhenthere are 100 users results in a Web page delay of approximately2.67  seconds. While this is significantly less than the 80 secondsassociatedwiththeuseofaT1linesharedby100users,itstillrep-resentsalargedelayifauserneedstoscrollthroughaseriesofWebpages.In addition,weneedtonotethattheaccessandegresstrans-portfacilitiesarecumulative,furtheraddingtothedelaysexperiencedbyabrowseruseraccessingaWebserver.

Page 143: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

124 aPraCtiCalguidetoContentdeliverynetworks

Becauseofthepreviouslymentionedegressdelays,largeorganiza-tions that wish to operate a centralized service may opt to upgradetheirtransportfacilitytoanOpticalCarrier(OC),typicallyinstallinganOC-3orOC-12 transport facility.AnOC-3 facility operates at155Mbpsand,ineffect,canbeconsideredtorepresentthecapacityof100T1facilitiesorapproximately3.5T3facilities.Incomparison,an OC-12 facility operates at 622 Mbps. As you might expect, athigherdatarates,themonthlycostofservicedramatically increases.Although the cost of a leased facility varies by distance, to provideageneralreferenceforreaders,whereaT1linemightcost$500permonth,anOC-12mightbeover$15,000permonth.Thus,selectinganOpticalCarrier transport facility is usually associatedwith largeorganizations.Whattheyobtaininadditiontoahigherdatarateistheability—forsomeadditionalcost—toinstallconcentricringsthatpro-videanear-instantaneousabilitytoreroutedataintheeventofafail-ureontheprimaryring.Becausemanylargeorganizations,suchasanairlineorrentalcaragency,couldloseasignificantamountofrevenueduetoacommunicationsfailure,theuseofOpticalCarriertransportfacilitieswithdualringsisgainingadvocates.However,ifyourorgani-zationislookingforadifferentapproachtominimizingegressdelays,onepotentialsolutionistoconsidertheuseofedgeservers.

4.2.4 Benefits of Edge Servers

ThedistributionofWebservercontentontoedgeserverscansignifi-cantlyreducemanyofthedelayscomputedinTable 4.2.ThisisbecausefewerbrowseruserscanbeexpectedtoaccessdistributedWebsitesincomparisontothenumberofusersthatcanbeexpectedtoaccessacentralizedsite.Unfortunately,thedistributionofservercontentontoedgeserverswillhavenobearingonthedelaysassociatedwithauser’saccessline.Thisisbecausetheuser’saccesslineremainsfixedunlesstheuserwaspreviouslyaccessingaWebsitefromworkandthenwenthomeandaccessedthesamesiteviaadifferentaccessline.Whilethisiscertainlypossible,forthevastmajorityofWebusers,theiraccessline can be considered as fixed along with the transmission delaysassociatedwith theiraccess line.Thus, therearecertain limitationsassociatedwiththedistributionofservercontentthatwillnotsignifi-cantlyimproveoperationsovertheuseofacentralizedWebsite.Now

Page 144: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 125

thatwehaveanappreciationforInternetentryandegressdelaysduetothetransportfacilityoperatingrateandpotentialusersharingofthefacility,let’sturnourattentiontoseveraladditionalbottlenecks.Onebottleneckthatwepreviouslycoveredandthatdeservesamoredetailed examination involves peering points and their effect uponWebserversrespondingtoabrowseruser’srequest.

4.2.5 Peering Points

TheInternetrepresentsacollectionofnetworksthatareinterconnectedinorder topermit theflowofdata fromacomputer locatedononenetworktoacomputerconnectedtoanothernetwork.Thosenetworksthat are interconnected can range in size froma local areanetworkwithahandfulofattachedcomputerstotheLANtonetworksoper-atedbyanInternetServiceProvider(ISP)thatcanconsistofhundredsofthousandsofDSLorcablemodemusersorevenlargeISPs,suchasAmericaOnline,thatatonetimesupportedapproximately20millionusersbutmore recentlyhadabout5millionusersas formerdial-upsubscribersgravitatedtoDSLandcablemodemconnections.

4.2.5.1 Rationale Because of the global nature of the Internet,mostnetworksarenotdirectlyconnected tooneanother. Instead,networkswereoriginallyinterconnectedviatheuseofthetransmis-sionfacilitiesofoneormorethird-partynetworks.Anexampleofthisinterconnectionviatheuseofthird-partynetworksisillustratedin Figure  4.4. In this example, for data transmitted from a com-puteruserlocatedonnetworkAtoarriveatacomputerlocatedonnetwork E,thetransmissionfacilitiesoftwoothernetworks,suchasnetworksBandCornetworksBandD,mustbeemployed toprovide an interconnection to the destination network. Similarly,a computer user on network B that requires access to a computerlocatedonnetworkEwouldneedtousethetransmissionfacilitiesofeithernetworkCornetworkD.BecauseroutersrequiretimetoexaminethedestinationaddressintheIPheaderofapacket,checkitsroutingtable,androutethepacketfromtheinterfaceitwasreceivedontoanotherinterfacefortransmissiontowarditsultimatedestina-tion,thereisadelayorlatencyassociatedwitheachrouterthroughwhichapackettraverses.Asapacketcrossesmorenetworks,italso

Page 145: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

126 aPraCtiCalguidetoContentdeliverynetworks

passes through additional routers,which cumulatively adds to thedelayorlatencyencountered.Inaddition,asadditionalnetworksaretraversed,thepotentialofencounteringanetworkfailureincreases.Thus,asthenumberofnetworkstraversedincreases,sotoodoesthedelayaswellastheprobabilitythatthedatapacketwillnotarriveatitsintendeddestination.

4.2.5.2 Peering and Transit Operations Althoughthetermpeering pointwill be used in this book, it is important tonote that the alternatenameofexchange point isalsocommonlyusedtoreferencethepointwhere two or more networks interconnect. In fact, some Web  sitesthatprovidea listingof interconnectionsontheInternetsatisfythisquirk in terminologybyusingboth terms.Figure 4.5 illustrates theWeb sitehomepageathttp://www.bgp4.as/internet-exchanges,whichlistsInternetexchangepointsaroundtheglobe.Thissitedependsuponviewerssendinge-mailstotheWebmastertonotifythesiteofaddi-tionsandupdates.Becausethesitehaslinkstoeachexchangepointorpeeringpoint,youcanusethissitetoviewpeeringpointsonaglobalbasis.However,whenthisauthorusedthissite,hediscoveredafewbadlinksthatcanresultinabitoffrustrationandmightbeduetothefailureofoneormorepersonstoperiodicallydouble-checklinks.

Network A

Network B

Network C

Network E

Network D

Figure 4.4 A computer user on network A needs to use the transmission facilities of third-party networks to reach a computer located on networks C, D, or E.

Page 146: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 127

Figu

re 4

.5

The

Web

site

ww

w.bg

p4.a

s pr

ovid

es a

link

to n

umer

ous

peer

ing

poin

ts a

roun

d th

e gl

obe.

Page 147: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

128 aPraCtiCalguidetoContentdeliverynetworks

Toobtainanappreciationforthewealthof informationthatcanbeobtainedfromasitethatlistspeeringpoints,thisauthorscrolleddownthescreenshowninFigure 4.5untilhereachedtheentryfortheDanishInternetExchangePoint.Byclickingonthatentry,hisbrowser was directed to DIX, a facility operated by UNIC at itsNetwork Operations Center located in Lyngby, which is north ofCopenhagen, Denmark. The site’s home page, which is shown inFigure 4.6,providesabriefhistoryofDIXandincludesthreegraphsthatindicatethetrafficloadoverthepast24hours,foroneweek,andforoneyear.

IfyoufocusyourattentionuponthefirstgraphshownatthetopofFigure 4.6,youwillnotethattrafficpeaksatapproximatelybetween9:30and10p.m.BecausetheactualprintoutoftheWebpageshowsalightgraphthatisnotveryvisibleinthescreencapture,thisauthorwasabletoviewthemaximumtrafficloadatabout12Gbps,withthemaximumcapacitybeing13Gbpsaccordingtothex-axislegendinthegraph.Thus,DIXisreachingcapacityduringtheevening,androutersaremorethanlikelydroppingpackets,whichresultsinsomedelays.

While Figure  4.6 provides three graphs that illustrate the daily,monthly,andyearlytraffic,youactuallyneedtodosomeanalysistodeterminetheeffectoftrafficdelays.Ifyouclickontheentry“ServiceInformation”displayedintheleftcolumnofFigure 4.6,youwillseeaWebpagethatwillprovideyouwithsuchinformationashownetworkscanbeconnectedtoDIXandthefactthattheyuseaCisco6509switchthathasboth1-Gbpsand10-Gbpsports.Whiletheinformationisinteresting,itdoesnotprovidetheviewerwithanylatencyinforma-tion,whichcouldbeextremelyvaluablewhenattemptingtodeterminethevalueofusingedgeservers.Whileyoucanmakesomeeducatedguessesbasedupontheoccupancyofdataontheexchange,especiallyintheevening,theyatbestareeducatedguesses.

One of the more interesting peering or exchange points is theBritishColumbia InternetExchange,whoseWebaddress ishttp://www.bc.net. At this site you can select a transit exchange serverand run either a bandwidth test or a diagnostic test, or both. Thebandwidth test performs two TCP (transmission-control protocol)throughput tests between your desktop computer, which can belocatedanywhereontheInternet,andthespecifiedNDT(networkdiagnostictool)server.First,dataisstreamedfor10secondsfromyour

Page 148: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 129

Figu

re 4

.6

The

hom

e pa

ge o

f the

Dan

ish

Inte

rnet

Exc

hang

e Po

int.

Page 149: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

130 aPraCtiCalguidetoContentdeliverynetworks

desktoptotheserver,andthenasecond10-secondtestisperformedintheoppositedirection.Incomparison,thediagnostictest,whichismoreformallyreferredtoasNPAD(NetworkPathandApplicationDiagnosis), is designed todiagnosenetworkperformanceproblemsinyourend-systemonwhichyourbrowserisrunningorthenetworkbetweenitandthespecifiedNPADserver.

ToillustratethecapabilityofBCIE,thisauthorranthebandwidthtest fromhiscomputer located inGeorgia to theVancouver server.Figure 4.7illustratesthedisplayofthetestresults,withtheBCNETpageshowninthebackground,whiletheforegroundshowsthetestresultsandthedisplayofdetailedstatistics.Whilethetestwasper-formed at a bit after 2 p.m. on a Tuesday, not exactly a period ofanticipated heavy traffic, the test results illustrate how finding thecorrect test canprovideyouwith theability todetermineboth thebandwidthavailableforaccessingadistantlocationandtheround-tripdelay.NotethatintheDetailedStatisticsbox,iteveninformsyouthatnonetworkcongestionwasobserved.Thus,byusingaWebbrowserfromseverallocationstoasitethatoffersasimilarcapability,youcandeterminetheanticipatedlatencyduringsuchpeaktimesasanearlyFridayafternoonoraMondaymorningbeforelunch.

Asanalternativetousingthefacilitiesofanexchangeorpeeringpoint,youcanuseatesttoolbuiltintoWindowsandotheroperatingsystems.TwotoolsthatarereadilyavailablearePingandTraceroute,thelattercalledTracertinWindows.Pingcanbeusedtobothverifythat adistant site is reachable and the round-tripdelay to the site.Incomparison,Tracert canbeconsideredasPingon steroids, as itdisplaysthepathtothedestinationaswellasrouteinformationandtheround-tripdelaytoeachrouteonthepath,whichcanbeusedtodeterminewherepotentialbottlenecks reside.Later in thischapter,wewilldiscusstheuseofTracertaswellasillustrateitsuse.

4.2.5.3 Transit and Peering Operations Returningtoourdiscussionofpeeringordataexchangebetweennetworks,let’sprobeabitmoreintothistopic.Whentwonetworkoperatorsinterconnecttheirfacilitiestoexchangedata,theyneedtodecideuponthetypeofdataexchangetheyarewillingtosupport.Ifanetworkoperatoriswillingtoaccepttrafficfromtheothernetworkthatisdestinedforadifferentnetwork,thenthenetworkoperatorisprovidingatransitfacilitytotheothernetwork.

Page 150: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 131

Figu

re 4

.7

Usin

g th

e BC

nET

trans

mit

exch

ange

test

faci

lity.

Page 151: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

132 aPraCtiCalguidetoContentdeliverynetworks

Thatis,thenetworkoperatorineffecthasagreedtoreceivetrafficfromtheothernetworkandpass it throughitsnetworkontotheInternetregardlessof itsdestination.BecauseInternet traffic isbidirectional,thismeansthatthenetworkoperatoralsoagreestoreceivetrafficfromtheInternetandpasssuchtrafficthroughitsnetwork.

If,insteadofagreeingtoacceptanytraffic,let’sassumethatbothnetwork operators are only willing to accept traffic from the othernetworkthatisdestinedforthem.Inthissituation,thetwonetworksarepeeringwithoneanother.That is,as longasthedatafromonenetworkisdestinedtotheother,itcanpassthroughthepeeringcon-nection,whileallothertrafficwillbeblocked.

Because there is no free lunch in the field of communications,mostorganizationswillchargeafeeforprovidingatransitcapabil-ity.Thatfeecanvary,rangingfromafeestructurebaseduponthebandwidthof theconnection toanamountpermegabyte (Mb)orgigabyte(Gb)ofdatathatflowsthroughonenetworkfromanothernetwork.Ano-costtransitagreementusuallyoccurswhenabignet-workproviderhasinterconnectionstoothersimilar-sizedproviders.Incomparison,afor-feetransitagreementisusuallyestablishedwhensmallernetworkprovidersareonlyconnectedtooneorafewothernetworks.Infact,ifyoupointyourbrowseratanypeeringpointorexchangepoint,youwillmorethanlikelyfindanentrylabeledsimilarto“Services&Prices,”whichwhenclickedupontakesyoutoapagethatliststhevariousconnectionoptionsandthecostassociatedwitheachoption.

Incontrasttotransitoperations,apeeringpointrepresentsaloca-tionwheremanynetworksareinterconnectedtooneanotherforthepurpose of exchanging traffic on a peering basis. To illustrate theadvantages associated with the use of peering points, consider thetwogroupsofnetworksshowninFigure 4.8.IntheleftportionofFigure 4.8,peeringisshownoccurringamongfournetworkswith-out theuseofacommonpeeringpoint. In this situation, (n−1)/2communications circuits are required for eachnetwork tobe inter-connected with every other network. Thus, in the example shownintheleftportionofFigure 4.8,eachnetworkrequires(4−1)/2or3/2 links.Becausetherearefournetworkstobeinterconnected,atotalof(4×3)/2or6communicationscircuitsarerequiredtointerconnecteachnetworktoeveryothernetwork.

Page 152: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 133

Intheright-handportionofFigure 4.8,theuseofapeeringpointisshown.Inthisexample,thefournetworksareinterconnectedatacommonlocation.Thus,eachnetworkonlyrequiresonecommunica-tionscircuitfortheinterconnection,oratotaloffourcircuitsforthefournetworks.

Theadvantagesassociatedwiththeuseofpeeringpointsinvolveboththesimplicityofinterconnectionsaswellaseconomics.Forexample,ifthecollectionoffournetworkspreviouslyshowninFigure 4.8wereincreasedbyjustone,thenthetotalnumberofcommunicationscircuitsrequiredtointerconnecteverynetworktoeachotherwithoutpeeringpointswouldbe5×(5−1)/2or10.Similarly,anincreaseinthenum-berofnetworkstobeinterconnectedtosixwouldresultintheneedfor6×(6−1)/2or15communicationscircuits.Incomparison,theuseofapeeringpointwouldonlyrequireeachnetworktohaveonecommu-nicationsconnectiontothepeeringpoint,oratotalofsixcommunica-tionsconnectionsinordertoprovideaninterconnectioncapabilityforall sixnetworks.Table 4.3 indicatesthenumberofcommunicationscircuitsthatwouldberequiredtointerconnecttwoormorenetworkswithandwithouttheuseofapeeringpoint.

FromanexaminationoftheentriesinTable 4.3,itisapparentthattheuseofapeeringpointcanprovideasignificantreductionincom-municationslinks,especiallyasthenumberofnetworkstobeinter-connectedincreases.Becauseeachcommunicationslinkrequirestheuseofarouterportandrouterscanonlysupportafinitenumberofserial ports, another router is required after the maximum supportlevel is reached. Both routers and router ports represent costs thatneedtobeconsidered.Inaddition,whenpeeringoccurswithouttheuseof apeeringpoint, the result canbe ameshnetwork structure

(a) Peering without a peering point (b) Peering with a peering point

Figure 4.8 Understanding the value of a peering point.

Page 153: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

134 aPraCtiCalguidetoContentdeliverynetworks

employedtointerconnectnetworkstooneanother.Underthissitu-ation,configuringroutersbecomesmorecomplexthanwhenapeer-ing point is employed, since the latter only requires the routing ofdata over a single communications path for one network to obtainconnectivity with all other networks. Another factor that needs tobeconsideredislatencyordelaytime.Inameshnetworkstructure,the individual communications linksnormallyoperate at a fractionofthedatarateofapeeringpointnetwork.Inaddition,inthemeshstructure,manyroutesarepossible,sotherouterhastomakemorecomputationsthanarouteratapeeringpoint.This,inturn,reducesthelatencyordelayassociatedwiththeuseofapeeringpoint.Thus,thecostofequipment,personneltimeandeffort,delaysfrompackettransits,andthecomplexityofconnectingnetworkswithouttheuseof apeeringpointhave resulted inmostnetworkconnectivitynowoccurringviapeeringpoints.

4.2.5.4 Global Structure of Peering Points The concept associatedwithpeeringpointshasnowbeenadoptedonaglobalbasis,withlocations throughout the United States, Europe, South America,Asia,andAustralia.Sometimesthetermmetropolitan area exchange(MAE)isused,whileothertermsemployedassynonymsincludenetwork access point (NAP)and Internet exchange (IX).Regardlessof the termused,peeringpoints arenow theprimarymethodbywhichISPsinterconnecttheirseparatenetworkstoenabletraffictoflowthroughouttheInternet.

Table 4.3 Communications Circuits Required to Provide network Interconnections

nUMBER oF nETWoRKS To BE InTERConnECTED

WITHoUT USInG A PEERInG PoInT

USInG A PEERInG PoInT

2 2 23 3 34 6 45 10 56 15 67 21 78 28 89 36 9

10 45 10

Page 154: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 135

4.2.5.5 Representative Peering Points Previously in this book, welooked at the statistics provided by one peering point in DenmarkandobservedtheuseoftheBCNETtransmit-exchangetestfacility.In addition, we briefly discussed how peering points can now beencounteredonaglobalbasis, andwe lookedatoneWeb site thatprovidedlinkstoexchangepointsonaglobalbasis.Toobtainabetterappreciationfortheglobalnatureofpeeringpoints,wewilldescribethe European Internet Exchange (Euro-IX) and several Europeanexchangepoints aswell as examine theuseof aWindows tool fordetermining the latency or delay associated with transmitting datafromonelocationtoanother.

4.2.5.5.1 Euro-IX The European Internet Exchange (Euro-IX)wassetupbytheoperatorsofEuropeanInternetExchangepoints.ThegoalofEuro-IXistoassistISPslookingtopeerataEuropeanexchangepoint.

Table  4.4 lists a majority of the European Internet Exchangepointsbycountryasofearly2010.ThenumberofEuropeanInternetExchangepointshadgrownto123byearly2010fromapproximately90whenthefirsteditionofthisbookwaswrittenin2005.ToobtainanappreciationoftheuseofaEuropeanInternetExchangepoint,wewillbrieflyexaminethreemembersofEuro-IX:theViennaInternetexchange (VIX), theBelgiumNational exchange (BNIX), and theLondonInternetexchange(LNX).

4.2.5.5.2 The Vienna Internet Exchange The Vienna Internetexchange(VIX)wasoriginallylocatedattheViennaUniversitycom-putercenter.Sincebeginningoperationsin1996,ithasexpandedtoasecondlocationwithinViennaatInterxion,Austria,whichislocatedin the21stdistrict in thenorthof thecity.VIXprovidesapeeringpointinthegeographiccenterofEurope.Bothlocationsusethesamestate-of-the-artEthernetswitchingtechnologybasedupontheuseofFoundry Networks BigIron RX-16 nonblocking high-performanceswitches.TheBigIronRXseriesofswitcheswasthefirsthardwaretoprovidesupport for2.2-billionpacket-per-secondswitchingandcanbeconsideredtorepresentanextremelyfastLayer2and3Ethernetswitch. Redundancy is supported at both locations through the useof redundant switch-fabrics through redundant power supplies and

Page 155: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

136 aPraCtiCalguidetoContentdeliverynetworks

Table 4.4 Members of the Euro-IX by Country

Austria Grazer Internet eXchange VIX: Vienna Internet eXchangeBelgium BnIX: Belgium national Internet eXchange FREEBIX: Free Belgium Internet eXchangeBulgaria SIX: Balkan Internet eXchange (Sofia)Croatia CIX: Croatian Internet eXchangeCzech Republic neutral Internet eXchange Commercial Brno Internet eXchange (Brno)Cyprus CYIX: Cyprus Internet eXchangeDenmark DIX: Danish Internet eXchangeEstonia TIX: Tallinn Internet eXchange TLLIX: Tallinn Internet eXchange (Tallinn)Finland FICIX-Espoo Finnish Communication and Internet eXchange (Espoo) FICIX-oulu Finnish Communication and Internet eXchange (oulu) TREX: Tampere Region Internet eXchangeFrance Equinix Paris Equinix Paris eXchange (Paris) EuroGix: A peering point FnIX6: eXchange in Paris FreeIX: A Free French eXchange GEIX: Gigabit European Internet eXchange (Paris) GnI: Grenoble network Initiative LYonIX: Lyon Internet eXchange MAE: Metropolitan Area Exchange (Paris) MAIX: Marseille Internet eXchange MIXT: Mix Internet eXchange and Transit PanAP: Paris network Access Point (Paris) PARIX: A Paris Internet eXchange PIES: Paris Internet eXchange Service PIX: Paris Internet eXchange PoUIX: Paris operators for Universal Internet eXchange SFInX: Service for French Internet eXchange

Page 156: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 137

Table 4.4 (continued) Members of the Euro-IX by Country

Germany ALP-IX: Alpen Internet eXchange (Munich) BECIX: Berlin Internet eXchange BCIX: Berlin Commercial Internet eXchange DE-CIX: Deutsche Commercial Internet eXchange ECIX: European Commercial Internet eXchange (formerly BLnX) (Berlin) ECIX: European Commercial Internet eXchange (Dusseldorf) ECIX: Hamburg European Commercial Internet Exchange (Hamburg) InXS: Internet Exchange Service (Munich and Hamburg) Franap: Frankfurt network Access Point MAE: Metropolitan Area Exchange (Frankfurt) KleyRex: Kleyer Rebstcker Internet eXchange (Frankfurt) MAnDA: Metropolitan Area network Darmstadt M-CIX: Munich Commercial Internet eXchange n-IX: nurnberger Internet eXchange Stuttgarter Internet eXchange (Stuttgart) Work-IX Peering Point (Hamburg) Xchangepoint, multinationalGreece AIX: Athens Internet eXchange GR-IX: Greek Internet eXchange (Athens) GR-IX-(b): Greek Internet eXchange (Athens)Holland R-IX: Rotterdam Internet eXchange (Rotterdam)Hungary BIX: Budapest Internet eXchangeIceland RIX: Reykjavik Internet eXchangeIreland CnIX: Cork neutral Internet eXchange (Cork) ExWest eXchange West (Galway) InEX: Internet neutral eXchange AssociationItaly MIX: Milan Internet eXchange (Milan) naMeX: nautilus Mediterranean eXchange Point (Rome) ToP-IX: Torino Piemonte eXchange Point (Torino) naMeX: nautilus Mediterranean eXchange Point (Rome) TIX: Tuscany Internet eXchange MinAP: Milan neutral Access Point (Milan) VSIX: VSIX nap del nord Est Padova FVG-IX: Friuli Venezia Giulia Internet eXchange (Udine)

continued

Page 157: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

138 aPraCtiCalguidetoContentdeliverynetworks

Table 4.4 (continued) Members of the Euro-IX by Country

Kazakhstan KAz-IX: Kazakhstan Traffic eXchange (Almaty)Latvia LIX: Latvia Latvian Internet eXchange (Riga) SMILE: Santa Monica Internet Local eXchange (Riga)Luxembourg LIX: Luxembourg Internet eXchange LU-CIX: Luxembourg Commercial Internet eXchange (Luxembourg)Malta MIX: Malta Internet eXchangenetherlands AMS-IX: Amsterdam Internet eXchange Gn-IX: Groningen Internet eXchange nDIX: Dutch German Internet eXchange nL-IX: nL Internet eXchange Gn-IX: Groningen Internet eXchange (Groningen) FR-IX: Friese Internet eXchange (Leeuwarden)norway BIX: Bergen Internet eXchange (Bergen) FIXo: Free Internet eXchange oslo (oslo) nIX: norwegian Internet eXchange (oslo) nIX2: norwegian Internet eXchange (oslo) SIX: Stavanger Internet eXchange (Stavanger) TIX: Tromsø Internet eXchange (Tromsø) TRDIX: Trondheim Internet eXchange (Trondheim)Poland LIX: Poland Lodz Internet eXchange (Lodz) PIX: Poznan Internet eXchange (Poznan) PLIX: Polish Internet eXchange (Warsaw) WIX: Warsaw Internet eXchangePortugal GIGAPIX: Gigabit Portuguese Internet eXchange (Lisbon)Romania BUHIX: Bucharest Internet eXchange Ronix: Romanian network for Internet eXchange InterLAn: Internet Exchange Bucharest, Cluj-napoca, Constanta, TimisoaraRussia CHEL-PP: Chelyabinsk Peering Point (Chelyabinsk) EKT-IX: Ekaterinburg Internet eXchange (Ekaterinburg) IX-nn: IX of nizhny novgorod nizhny (novgorod) KRS-IX: Krasnoyarsk Internet eXchange (Krasnoyarsk)

Page 158: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 139

Table 4.4 (continued) Members of the Euro-IX by Country

MSK-IX: Moscow Internet eXchange (Moscow) nSK-IX: novosibirsk Internet eXchange (novosibirsk) PERM-IX: Perm Internet eXchange (Perm) RnD-IX: RnD-IX Rostov on Don SPB-IX: St.-Petersburg Internet eXchange (St. Petersburg) SMR-IX: SAMARA-IX (Samara) ULn-IX: Ulyanovsk Internet eXchange (Ulyanovsk) Ural-IX: Ural-IX (Ekaterinburg) VLV-IX: Vladivostok Internet eXchange (Vladivostok)Scotland World.IX: European Commercial IX (Edinburgh) ScotIX: Scottish Internet eXchangeRepublic of Slovenia SIX: Slovenian Internet eXchange (Ljubljana) SIX: Slovak Internet eXchange (Bratislava) SIX: Kosice Slovak Internet eXchange (Kosice) Sitelix: Sitel Internet eXchange (Bratislava)Spain CATnIX: Catalonia Internet eXchange (Barcelona) ESPAnIX: Spain Internet eXchange GALnIX: Galicia Internet eXchange MAD-IX: Madrid Internet eXchange EuskonIX: Punto neutro Vasco de Internet BilbaoSweden GIX: Gothenburg Internet eXchange (Gothenburg) IXoR: Internet eXchange point of the oresund Region (Malmoe) Linkoping Municiple eXchange LIX: Lule Internet eXchange nETnoD: Internet eXchange (Stockholm) netnod: Gothenburg netnod (Gothenburg) netnod: Malmoe netnod (Malmoe) netnod: Sundsvall netnod (Sundsvall) netnod: Lulea netnod (Lulea) norrnod: norrnod Umea RIX-GH: Gaveleborg Regional Internet eXchange STHIX: Stockholm Internet eXchange (Stockholm)Switzerland CIXP: CERn eXchange for Central Europe SWISSIX: Swiss Internet eXchange TIX: Telehouse’s zurich eXchange Equinix: zurich

continued

Page 159: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

140 aPraCtiCalguidetoContentdeliverynetworks

powerfeeds.Toprovideredundancyforcommunications,eachloca-tionincludesdiversity-routedoptical-fibercabletransmissionfacilities.BothIPv4andIPv6peeringissupportedonthesamepeeringLANusingBorderGatewayProtocol,Version4(BGP4)routingprotocols.

A VIX member is required to be an ISP with its own Internetconnectivity.EachVIXmemberisrequiredtohaveitsownautono-mous system (AS)numberwith already-establishedglobal Internetconnectivity.AVIXmembermustalsoprovideInternetaccesstoitscustomersattheIPlevel.Thus,acontentproviderwouldnotqualifyforVIXmembership.ParticipantshavethechoiceofconnectingtoVIX at either of the two locations currently established, while theredundantVIXinfrastructureatbothsitesisprovidedandoperatedby the University of Vienna. Participants are expected to use theirVIXconnectionasacomplementarytoolforoptimizationofregionalInternettrafficflows.TheonlyroutingprotocolthatcanbeusedacrosstheVIXinfrastructureisBGP4.AlthoughthepreferredmethodofconnectingtoVIXisviatheinstallationofaBGP4peeringrouteratoneorbothof theVIX locationsandtodirectlyconnect thepeer-ingrouterporttothelocalVIXswitchasanalternative,customerscan also connect to VIX through the use of a fiber-optic cable orLayer2/Ethernetcarrierlink(DWDM,EoMPLS,VLAN)fromtheBGP4peeringrouterabroadtooneoftheVIXlocations.

Table 4.4 (continued) Members of the Euro-IX by Country

Ukraine Crimea-IX: Crimea Internet eXchange (Simferopol) DTEL-IX: Digital Telecom Internet eXchange (Kiev) UA-IX: Ukrainian Internet eXchange (Kiev) KH-IX: Kharkov Internet eXchange (Kharkov) od-IX: odessa Internet eXchange (odessa) UTC-IX: Ukrtelecom Internet eXchangeUnited Kingdom LInX: London Internet eXchange (London) ManAP: Manchester network Access Point (Manchester) LIPEX: London Internet Providers eXchange (London) LonAP: London network Access Point (London) MCIX: Manchester Commercial Internet eXchange (Manchester) RBIEX: Redbus Interhouse Internet eXchange (London) PacketExchange: Distributed over 19 cities across the EU, north America, and Asia Pacific;

based in London MerieX: Meridian Gate Internet eXchange (London)

Page 160: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 141

4.2.5.5.2.1 Membership Costs VIX services are provided on anot-for-profitbasis.Thismeansthat itstariffsaresettorecoverthecostofoperationanddonotincludeanyprofit.Table 4.5indicatestheVIXtariffineffectduring2005,whenthefirsteditionofthisbookwaswritten,aswellasin2010,whenthesecondeditionwaswritten.Thusthistableprovidesaninterestingexampleofthecostofpeeringoverafive-yearperiod.Notethatsomeserviceswerereplacedbyotherservicesinthefive-yearperiod,suchastheeliminationoflow-speed10-Mbpsswitchports.AlsonotethatthepricingisshowninEuros.Inearly2005,aEurowasworthapproximately$1.40,whilein2010the Euro was approximately $1.45, so readers can multiply Euroentriesinthetableby40%–45%toobtainanapproximatecharge.

InadditiontothetariffslistedinTable 4.5during2005,therewasasupplementalchargeof150Euros/monththatwasonlyapplicabletotheUniversityofVienna.ThischargewasforanATM/OC-3mul-timodeportonaLS1010switch.TheATMswitcheswereeventuallyreplacedbyBigIronswitches,andtheadditionalfeehasdisappeared.Thus,whenyoucomparefeesoverthepastfiveyears,itisobviousthattheyhavedecreased.Today,usersofVIXhavethreefeestoconsider:asetupfee,aVIXswitch-portfee,andafeeforhousingequipment.TheVIXswitch-portfeeisbaseduponthespeedoftheinterconnec-tion,whilethehousingfeeisbasedupontheamountofshelfspacerequiredforequipmentlocatedattheViennaInternetexchange.

4.2.5.5.3 Belgian National Internet Exchange Moving northwestfromVienna,Austria,weturnourattentiontotheBelgianNational

Table 4.5 Vienna Internet Exchange Tariffs

2005 2010

Service Fee in Euros Fee in EurosSetup 100/month 1000 per contractVIX switch port 10 Mbps (10 Base T) 100/month 100 Mbps (100 Base T) 300/month 200/month 1 Gbps (1000 Base SX) 1000/month 600/month 10 Gbps (10 GigBaseSR, 10 GigBaseLR) 2000/monthHousing for shelf in 19-inch rack Up to 3 height units 225/month 150/month 4 to 5 height units 375/month 50/month per additional 6 to 9 height units 675/month

Page 161: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

142 aPraCtiCalguidetoContentdeliverynetworks

Internet exchange (BNIX). BNIX was established in 1995 by theBelgian National Research Network (BELNET) and represents apeeringpointwhereISPs,contentproviders,andotherbusinessuserscan exchange traffic with each other in Belgium. In 2010, BNIXhad two different locations. One location is at Interxion, an orga-nizationwithheadquartersinAmsterdamthatdesignsandoperatescarrier-neutral Internet exchange centers across Europe. InterxionoperatesexchangepointsinmanyEuropeancities,includingBrussels.ThesecondBNIXlocationisoperatedbyLevel(3),aglobalcommu-nicationsand informationservicescompanyperhapsbestknown insomequartersforthelargeinternationalfiber-opticnetworkitcon-structed that interconnects 62 North American and 16 Europeanlocations.Local(3)openeditsBrusselsgatewayinMarch,2000,ina5700-m2buildinginastate-of-the-artfacility.

Whenthissecondeditionwaspreparedduring2010, therewereover40membersthatwereconnectedtotheBNIXnetwork,whichsupports the exchange of IP multicast traffic as well as IPv4 andIPv6traffic.FromtheBNIXWebsite,youcanobtainalistingofthemembersof the Internet eXchange to include their IPv4 addressesand,ifused,theirIPv6addresses.

In2005,BNIXwasconstructedbaseduponadistributedLayer-2-switched medium consisting of Fast Ethernet (100 Mbps) andGigabit Ethernet (1000 Mbps) switches connected to one anotherusing 10-gigabit Ethernet technology. According to BNIX, thisinterconnectionmethodprovidesahigh-speed,congestion-freeinter-connectionfacilitythatenablesparticipatingISPstoexchangedatawithoutexperiencinganysignificantbottlenecks.By2010,BNIXhadintroducedtheuseof10-GbpsEthernettechnology,whichprovidedmemberswithhigh-speed,congestion-freeinterconnectionsbetweenmembersofthenetwork.

4.2.5.5.4 London Internet Exchange, Ltd. In concluding our brieftourofrepresentativeEuropeanexchangepoints,wenowfocusourattentionupontheLondonInternetExchange,Ltd.(LINX).LINXrepresents the largest exchange point in Europe and is a foundingmemberofEuro-IX.

LINXwasfoundedin1994byagroupofInternetserviceprovidersandhasgrownrapidly.Inearly2010,LINXhad352membersand

Page 162: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 143

had accepted three new applications for membership in January.TheInterneteXchangePointhad693connectedmemberportsandsupportedover567Gbpsofpeaktraffic.Currently,LINXoperatestwophysicallyseparatenetworksbasedupondifferentarchitecturesand equipment obtained from different communications vendors.OnemanufacturerisExtremeNetworks,whilethesecondvendorisBrocadeCommunications,formerlyknownasFOUNDRYNetworks.Thetwonetworksaredeployedover10locationsaroundLondonandinterconnected through the use of multiple 10-Gbps Ethernet viafiber-opticnetworks.

LINX represents a not-for-profit partnership between ISPs,providing a physical interconnection for its members to exchangeInternettrafficthroughcooperativepeeringagreements.Candidatesfor becoming a LINX member must have an Autonomous SystemNumber(ASN)andusetheBGP4+protocolforpeering.

4.2.5.5.4.1 MembershipCosts AlthoughtheLINXtariffhassomesimilarities to the previously discussed VIX tariffs, there are alsosomesignificantdifferencesbetweenthetwofeeschedules.Table 4.6providesasummaryoftheLINXtariffineffectduringearly2005.InexaminingTable 4.6,note that inadditiontoasetupor joiningfee,LINXwaschargingmembersaquarterlymembershipfee.WhileLINXwasbillingsubscriberssimilartoVIXforportandrackspace,LINX,unlikeVIX, alsohad a traffic charge,which for largedataexchangescouldsignificantlyadduptoaconsiderableexpense.

WhileLINXwassimilartoVIXin2005—inthatbothoperateasnot-for-profitentities—LINXatthattimeprovidedinterconnections

Table 4.6 The London Internet Exchange Price list in 2005

SERVICE PAYMEnT SCHEDULE GBP EURo

Joining fee once 1000 1500Membership fee Quarterly 625 938Port fees 100 Mbps Monthly 175 263 1 Gbps Monthly 644 966 10 Gbps Monthly 2415 3625Traffic charge Per Mbyte Monthly 0.60 0.86Rack space Per unit Monthly 50 75

Page 163: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

144 aPraCtiCalguidetoContentdeliverynetworks

atupto10Gbps,whichwasaconsiderablyhigherpeeringratethantheoperatingrateprovidedbytheViennaInterneteXchange.Since2005,bothLINXandVIXhave reduced their fees. In fact, effec-tive 1  January 2010, the LINX membership fee was reduced to1500 poundsperyear,with1-GbpsEthernetand10-GbpsEthernetport fees also reduced. Concerning port fees, they now vary baseduponthetypeofLANyouconnectwith,BrocadeorExtreme,andtheportoperatingrate.Table 4.7illustratesthemonthlyLINXportcostasofJanuary2010.

Althougha10-Gbpsinterconnectionshouldminimizebottlenecks,inactualitypotentialbottlenecksdependuponthetrafficexchangedataparticularpoint in time.Thus, inconcludingourdiscussionofpeeringpoints,wewillreturntotheuseoftheTracerouteprogramtoexaminepeeringpointdelays.

4.2.5.6 Peering Point Delays Inconcludingourdiscussionofpeeringpoints,wewillreturntotheuseoftheMicrosoftTracertprogramincludedindifferentversionsoftheWindowsoperatingsystem.TouseTracert,youneedtoopenanMS-D0SwindowinolderversionsofWindows,orwhat isnow referred to as theCommandPromptwindowwhenusingmoremodernversionsoftheWindowsoperatingsystem,suchasWindows2000,WindowsXP,WindowsVista,andthenewlyreleasedWindows7.Whenusingamoremodernversionof the Windows operating system, you can locate the CommandPromptmenuentryby selectingStart>Programs>Accessories>CommandPrompt.

4.2.5.6.1 Using Tracert The use of Tracert can provide you withthe ability todeterminewherebottlenecks areoccurringwhenyouencounterdelays inaccessingaserver.Althoughmostpersonshavetheinclinationtocitetheserverasthecontributingfactorwhenexpe-riencingslowresponsetime,it’squitepossiblethatthedelayresidesintheInternet.

Table 4.7 LInX Monthly Port Charges Effective January 2010

PoRT SIzE PoRTS on BRoCADE LAn PoRTS on EXTREME LAn

100-M ports £160 £1601-G ports £446 £33510-G ports £1665 £1250

Page 164: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 145

Todetermineifthenetworkrepresentsmostofthedelayyouareexperiencing when accessing a server, you could first use the PingprogrambuiltintoWindows.Aspreviouslymentionedinthisbook,Pingprovidesyouwiththeround-tripdelayinmilliseconds(ms)toadefinedIPaddressorhostname.If theround-tripnetworkdelayappears to be reasonable, then the delay can be attributable to theserver.Incomparison, if theround-tripdelayprovidedthroughtheuseofthePingprogramisrelativelylengthy,thentheresponsetimedelay you are experiencing has a significant network component.Whenthissituationarises,youcanthenusetheTracertprogramtodeterminewherethedelaysinthenetworkareoccurring.

To illustrate the use of Tracert, let’s assume you are accessingwww.londontown.com,aWebsitethatprovidesavarietyoftouristservicesforpersonsvisitingLondon,England.Figure 4.9illustratesthe home page of www.londontown.com. Note that from this siteyoucansearchforahotelorbed-and-breakfast,arrangeforairporttransfers,booksightseeingtours,andevenreservetheaterticketstothebestshowsinLondon.BecauseLondonisoneofthemostpopulartouristdestinationsintheworld,www.londontown.comrepresentsapopularWebsite.

BecausetheresponsetopagerequeststoLondontown.comcanberelatively long during an approaching holiday, let’s use the TracertprogramtoexaminethedelaysassociatedwiththeroutetothatWebsite.Figure 4.10 illustrates theuse of theTracert program. In thisexample, this author traced the route to Londontown.com whileaccessingtheInternetfromhisworkplacelocatedinMacon,Georgia.

In examining the entries shown in response to the use of theTracertprogram,notethateachlineofoutputcorrespondstoa“hop”thatthedatahastogothroughtoreachitsdestination.Thefirsthoprepresents the delay associated with the author’s connection to theInternet.Becauseallthreetrieshavealatencyunder10ms,wecanassumethattheaccesslineisnotcongested.PartofthereasonforthelackofcongestioncanbetracedtothetimeperiodwhentheTracertprogramwasused,whichwasintheearlymorningpriortoabuildupinlocalaccesstraffic.

Thesecondhop results in theflowofdata toa router located inAtlanta.ThethirdhopalsorepresentsarouterlocatedinAtlanta.Ifyou carefully read the routerdescriptions for the routers associated

Page 165: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

146 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 4

.9

The

hom

e pa

ge o

f Lon

dont

own.

com

.

Page 166: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 147

withhops2and3,youwillnotethattherouterlocatedathop2isassociatedwithbbnplanet,whiletherouterathop3isassociatedwith“level3communications,”awholesaletelecomcarrierthathasrapidlyexpanded andoperates many Internet eXchange or peering points.Thus,betweenhops2and3,dataflowsfromonecarriertoanotherduetoapeeringarrangement.Becausebothhops2and3arelocatedinAtlanta,propagationdelayisminimal,andbyhop3,twooutofthreecomputeddelaysareshowntobeunder10ms,whilethethirddelayisshowntobe10ms.Forallthreetimes,thedelayisminimal.

Forhops3through5,trafficremainsonthelevel3network,wheredata is routed to Washington, D.C. Between hops 6 and 7, trafficexits the level3networkandentersAboveNet.Thelatterrepresentanall-opticalnetworkbackbonethatinterconnectsdatacentersintheUnitedStatesandEurope,includinglocationsinNorthernVirginiaintheUnitedStatesandLondonintheUnitedKingdom.Thus,hops 7through9correspondtotheroutingofdataontheAboveNetnetworkin theUnitedStates.Note thatbyhop9 for twooutof three timemeasurements,thedelayis20ms,withthethirddelaytimeshownas10ms.Again,thesearereasonabledelays.

4.2.5.6.2 Propagation Delay Fromhop9tohop10,dataflowsfromthe AboveNet router located in the United States to that vendor’srouter located in the United Kingdom. Note that the time delay

Figure 4.10 Using Tracert to observe the delays in reaching www.Londontown.com.

Page 167: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

148 aPraCtiCalguidetoContentdeliverynetworks

significantlyincreasesfrom10to20msathop9to90msathop10.Thisdelayresultsfromplacingdataontoatrans-AtlanticfibercableandincludesthepropagationdelayassociatedwithdatacrossingtheAtlanticOcean.

OncedataarrivesinEnglandathop10,thedelayassociatedwithreaching theLondontown.comWeb site isnegligible.This ismorelikelyduetotheuseofrelativelyfastswitchesthatoverrideconges-tionduetothefactthatitis+6hourslaterinLondon,whereusersaremoreactivelysurfingtheInternet.Thus,theprimarydelayinthissituationresults fromanapproximate70-mstimerequired fordatafromarouter located intheUnitedStates toreacharouter locatedin the United Kingdom. This propagation delay represents 70/90orapproximately78%ofthetotaldelayandcouldbeavoidediftheWeb siteoperatorestablishedanagreementwithacontentdeliveryproviderthatresultedinthedistributionoftheirservercontentontoacomputerlocatedintheUnitedStates.Whilethepeeringpointdelayshowninthisexamplewasvastlyexceededbythepropagationdelay,thisisnotalwaystrue.Asindicatedinthissection,throughtheuseof theTracertprogram,youcandetermine the locationwherenet-workdelaysoccur,ineffectobtainingavisualindicationofnetworkperformance.

4.3   Edge Operations

FromourexaminationofInternetbottlenecks,it’sapparentthatthecentralizedmodelofserver-basedcontentcanresultinseveraldelays.Thosedelays include thebandwidth limitationsassociatedwith theaccess linesconnectingthebrowseruserandservertotheInternet,peeringpointinterconnectiondataratesandtraffic,routerhopstra-versed from thebrowseruser to the server, andpropagationdelaysbaseduponthedistancebetweenthetwo.Bymovingcontentfromacentralizedlocationtoserversdistributedgeographicallytoareasthatbettermatchaccessrequirements,manyofthepreviouslymentioneddelaysareminimized.Forexample,let’sassumethatamultinationalJapaneseorganizationhasitsservercontentdistributedtoedgeserverslocated inEurope,Africa, theUnitedStates,Australia, andChinafromasingleserverresidinginTokyo.

Page 168: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 149

Withoutthedistributionofservercontents,alluserrequestswouldhavetotraverseseveralnetworks,multiplerouterhops,andmorethanlikelypassthroughmultiplepeeringpoints.However,withcontentnowdistributedontoserverslocatedaroundtheglobe,browserusersavoidmost,ifnotall,ofthepreviouslymentionedbottlenecks.Thus,movingcontentclosertogroupsofpotentialbrowserusersrepresentsamethodtoeliminatepotentialpeeringpointbottlenecks, reducesthenumberofrouterhopsdatamusttraverse,minimizespropagationdelays,andmayevenallowdataflowtoremainononeISPnetwork.Inaddition,becauseacentralizedserveracquiresbrowseruserrequestssimilartoafunnelwithasingleentrypointintheformofaWeb server’saccessline,movingcontenttodistributededgeserversremovesacentralizedWebsite’snetworkaccessconstraint.Nowthatwehaveabasicappre-ciationfortheadvantagesassociatedwithmovingInternetcontentontodistributedservers,commonlyreferredtoasedge servers,let’sturnourattentiontohowedgeserveroperationsareperformed.

4.3.1 CDN Operation

ThereareseveralcommerciallyavailableContentDeliveryNetworkoperators, each employing a slightlydifferentmethodofoperation.Because Akamai Technologies can be considered to represent theleaderinthefieldofCDNproviders,wewillfocusourattentionupontheCDNoperationofthisvendor.Indoingso,wewillfirstnotetheirsupportforanemergingstandardintheformofamarkuplanguageprovides the ability to easily distribute content onto edge servers.Oncethisisaccomplished,wewilldiscusstheirrelativelynewsupportforthedistributionofhigh-definitionvideo,whichisrapidlygrowinginimportanceformanyWebsiteoperators.

4.3.2 The Akamai Network

AkamaitracesitsrootsbacktotheemergenceoftheWorldWideWeb(WWW)in1995,whenMITProfessorofAppliedMathematicsTomLeighton,wholiterallyworkedincloseproximitytothedeveloperofthebrowser,TimBerners-Lee,was intriguedbytheneedtobetterdeliverWebcontentbymovingthecontentontodistributedservers.WorkingwiththeassistanceofthengraduatestudentDannyLewin

Page 169: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

150 aPraCtiCalguidetoContentdeliverynetworks

andothers, theresultwasthe launchingofacommercialservice inApril1999,withYahoo!beingachartercustomer.

TheAkamainetworkin2005hadgrowntoapproximately15,000servers. By early 2010, the vendor’s server population had furtherincreased to over 56,000. Those servers are distributed across theglobe and are connected by approximately 1000 networks in morethan70 countries.TodayAkamaiprovidesacontentdeliverynetworkfacility that is used by over 1200 of the world’s leading electroniccommerceorganizationstoprovidecontentdeliveryservices.Infact,accordingtothecompany’sWebsite,Akamaideliversbetween15%and 20% of all Web traffic. Its extensive customer base includessuch organizations as Best Buy, the U.S. Department of Defense,FedExCorporation,GeneralMotors,IBMCorporation,QVC,SonyEntertainmentGroup,Toyota,Victoria’sSecret,andYahoo!

4.3.2.1 Type of Content Support Through the year2000,most contentdeliverynetworkproviders focusedtheireffortsupondeliveringstaticcontent.Whilethedeliveryofsuchcontentwassatisfactoryformanyorganizations,thegrowthinelectroniccommerceanddevelopmentoftoolstoenhanceWebpageswithvaryingcontentincreasedtheneedtosupportbothdynamicandpersonalizedcontent.Inaddition,therecentgrowthinthepastfewyearsintheuseofavarietyofgadgets,rangingfromsmartcellphonestonetbookstoviewtelevisionandhigh-definitionvideo,hashadatremendouseffectupontheabilityofmanyWebserverstosupportthegrowthindemand.Today,severalcontentdeliverynet-workproviders,includingAkamaiTechnologies,arecapableofdistrib-utingtheentirecontentsofcustomers’Websites,includingstaticanddynamicpageswithhigh-definitionvideoaswellasvariousembeddedobjects.Toobtainanappreciation for themannerbywhichcontentdelivery operates, let’s first examinehowbrowser requests are com-monlyfulfilledbyacentralizedelectronic-commerceWebsite.

4.3.2.2 Centralized Web Site Access Let’s assume that abrowseruserwishestopurchasethelatestreleaseofamovieonDVD.ThatpersonmightaccessanelectroniccommercesiteandsearchforaparticularDVD,suchastheStar Wars TrilogyDVDorthemorerecentlyreleasedAvatar.Whenthebrowseruserfillsinasearchlineandclicksonabutton,hisorhersearchentryisforwardedfromtheWebserverto

Page 170: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 151

anapplicationserver,asillustratedintherightportionofFigure 4.11.Theapplicationserverusesthenewlyfurnishedstringquerytoper-formadatabasequery.Theapplicationserverthenassemblesapagebased upon information provided by the database server as well assuchcommonpagecomponentsasthesite’slogos,anavigationmenu,andperhapsevenselectedadvertisingbaseduponthetypeofproductqueries.Thebrowseruserthenreceivestheassembledpagethatallowshim or her to add the DVD or DVDs to a digital shopping cart,selectasuggestedtitlegeneratedbytheapplicationserver,orperformanotheractionsuchas“checkout.”

Ifweassumethatanotherbrowserusermakesthesamerequest,someofthesameseriesofstepswillhavetobeperformedagain.Thatis,thebrowseruser’srequestwillflowtotheWebsite,whichinturnwillpassthesearchquerytoanapplicationserver.Thatservermightthencheckitscachememorytodetermineiftherequestedpagecanberapidlydelivered.Ifacopyofthepageisnotinmemory,theappli-cationserverwillhavetore-createthepage.Althoughretrievalofthepagefromcachememorycanslightlyenhanceresponsetime,pagesmuststillflowfromthecentralizedsitebacktodifferentbrowserusers,resultinginavariabledelaybaseduponthenumberofrouterhopsandpeeringpointsthatmustbetraversed,aswellastrafficactivityontheInternet,forapagetoreachtherequester.

4.3.2.3 Edge Server Model In comparison to the centralized Website model, when Akamai edge servers are used, all initial Web

Browseruser

requestResponse

Webserver

Backend

productdatabase

Imageserver

Advertisementengine

Appl

icat

ion

serv

er

Internet

Figure 4.11 Centralized-site electronic commerce operation.

Page 171: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

152 aPraCtiCalguidetoContentdeliverynetworks

page queries for a site that has contracted with the vendor flowto the vendor, while subsequent requests flow to an Akamai edgeserver.Theedgeserverchecksitsinternalcachetodetermineiftherequestedpagewaspreviouslyrequestedbyanotherbrowseruserandstillresidesinmemory.Concerningcacheabilityofcontent,Akamaiprovides each Web site it supports with a metadata configurationfile.TheWebsitemanagerusesthatfiletodefinehowoftenspecificpagescanchange.

For our previous DVD example, let’s assume that the electroniccommercesiteoperatoronlychangestheDVDpricingatmostonceperday.Then,theWebsitemanagerwouldassigntheDVDpageatime-to-live(TTL)valueofoneday.Thus,thefirsttimeabrowseruserrequeststhepage,itwillbeassembledbytheWebsiteapplica-tionserveraspreviouslyillustratedinFigure 4.11.BecausethepagehasaTTLvalueofoneday,thispagewillbestoredonAkamaiedgeserversforthesameperiodoftime,enablingsubsequentrequestsforthatpagetobedirectlyservedtoabrowseruserclosertotheuserthanacentralizedWebsite.Figure 4.12illustratesthedataflowassociatedwiththeuseofAkamaiedgeservers.

In examining the data flow shown in Figure  4.12, note thatalthough theWebpagewas createddynamically at the centralizedWebsite,theentirepagecanbestoredontheAkamainetwork.Thisresults from the fact that, although thepagewas assembled for anindividual user, there are no user-specific components, such as thepersonalizationofapageviatheuseofcookies,thatcouldprohibitthecachingofthepage.

1. User requests pageand is directed toclosest Akamai server

2. Page assembledby application serverbased on first user request

3. Page sent to edgeserver(s) and storedbased on TTL definedin metafile configuration

4. Subsequent requests for pagesare delivered from the edgeuntil the TTL value expires

Back enddatabase

server

ImageserverApplication

serverWebserver

Akamaiserver

Akamaiserver

Akamaiserver

Figure 4.12 Data flow using Akamai edge servers.

Page 172: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 153

4.3.2.4 Limitations The key limitation associated with the distri-butionofWebservercontentontoedgeserversconcernstheuseofcookies and agents for page personalization. Such sites as Yahoo,MSN,andotherportalsusecookiestocreateadynamicandperson-alizeduserexperience.Forexample,considerFigures 4.13and4.14.Figure 4.13illustratestheYahoohomepageinearly2010onthedayoftheMassachusettssenatorialelectionpriortothisauthorsigningintoYahootoaccesshismail.Incomparison,Figure 4.14illustratestheYahoohomepageafterthisauthorsignedintoYahoo.NotethatinFigure 4.14,thepageisslightlypersonalized,with“Hi,Gilbert”showninboldundertherightportionofthesearchbar.Inaddition,if this author were to click on the down arrow to the right of hisname,hecouldthenviewhisprofileandhiscontacts,obtainavarietyof account information, and see that he is now signed into Yahoounderaspecificidentifier.Byusingacookietocreateadynamicandpersonalized experience,Yahoo also remembers that this author issigned in as he moves about the portal to check other features inadditiontoe-mail.

AlthoughtheuseofcookiesenablesWebpagepersonalization,sitesusingcookiesarenormallyconsideredtobenoncacheable.Thismeansthat thecentralizedWebsitemustmaintainpersistent connectionstoAkamaiedgeservers.Althoughtheedgeserversmustthencom-municatewiththecentralizedWebsite,theoriginalsiteonlyneedstohaveafinitenumberofconnectionstoedgeserversinsteadoftensofthousandsormoreconnectionstoindividualbrowserusers.Despitethe inability to cache dynamic pages, the serving of uncacheablecontentviaedgeserversoffersseveraladvantages.ThoseadvantagesincludetheabilityofoffloadingCPUandmemoryfromacentralizedservertotheabilityofedgeserversthatcanrespondfastertobrowserusers than serving requests from a central site. In addition, due tothefactthatedgeserversarelocatedaroundtheglobe,reliabilityofbrowseruseraccessisincreased.

4.3.3 Edge Side Includes

Akamai Technologies, in collaboration with application server andcontent management organizations, including IBM and Oracleamong others, developed a new markup language known as Edge

Page 173: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

154 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 4

.13

The

Yaho

o ho

me

page

prio

r to

sign-

in.

Page 174: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 155

Figu

re 4

.14

The

Yaho

o ho

me

page

aft

er th

e au

thor

sig

ned

in.

Page 175: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

156 aPraCtiCalguidetoContentdeliverynetworks

SideIncludes(ESI).EdgeSideIncludesrepresentsasimplemarkuplanguageusedtodefineWebpagefragmentsfordynamicassemblyat the edgeof the Internet, enabling anorganizationwith a singleserver tohave its contents easilydistributedaround theglobevia aserviceagreementwithacontentdeliverynetworkprovider.ThroughtheuseofESI,filescanberetrievedthatcanbeusedtodynamicallyconstructaWebpageinresponsetobrowseruserrequests.EachfilecanbecontrolledwithitsownTTLvalue,whichdefinesthetimeitwillresideincachememory,enablingonlysmallportionsofapagetoberetrievedfromacentralWebsitetobuildafullpagefordeliverytoabrowseruser.Thus,cookiescouldberetrievedfromthecentralWeb site, while static portions of a page might be stored in cachememoryonedgeservers.ThisactionenablesanAkamaiedgeservertoassembleaWebpageinsteadoftheapplicationserverconnectedtoacentralizedWebsite.Becausepagesareassembledclosertothebrowseruser,thepagescanbedeliveredfaster.Inaddition,becausemorerequestsareservicedonedgeservers,thisactionreducestraffictothecentralizedWebserver.

4.3.3.1 ESI Support ThekeytotheabilitytomovedynamiccontentontoedgeserversistheuseofESI.BoththeapplicationserverandAkamaiedgeserversmustsupporttheESI languagetoenabletheapplicationsthatmustbedeployedifbrowserusersaretoobtainedgeserversupport.ThedevelopmentofcontentusingESIbeginsatthecentralizedWebsitewiththedevelopmentoftemplatesandcreationof fragments. This is followed by the local assembly of pages andtheirplacementincachememoryatthecentralWebsiteandtheirdistribution toAkamai edge servers for remoteassemblyandpagecaching.Thus,whenabrowseruserrequestisdirectedtothecentralsite,theycanfirstretrieveinformationfromcacheatthecentralWebsite,alocationreferredtoastheedgeofthedatacenter.Subsequentbrowseruserrequestsaredirectedtoanedgeserverforprocessing.BecausetheESIlanguagerepresentsthemechanismbywhichappli-cationserversandedgeserverscommunicate,let’sobtainanoverviewofitscapabilities.

ESIrepresentsasimplemarkuplanguageusedtodefineWebpagecomponents fordynamicassemblyanddeliveryofWebapplicationsontoedgeservers.ESIrepresentsanopen-standardspecificationthatis

Page 176: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 157

beingcoauthoredbyapplicationserverandcontentmanagementven-dors.Atthetimethisbookwasprepared,vendorssupportingtheESIeffortincludedAkamaiTechnologies,ATG,BEASystems,Circadence,DigitalIsland,IBM,Interwoven,Oracle,Sun,andVignette.

The primary benefit of ESI is that its use accelerates the deliveryofdynamicWeb-basedapplications.TheuseofthismarkuplanguageenablesbothcacheableandnoncacheableWebpage fragments tobeassembledanddeliveredat theedgeof the Internet.Toprovide thiscapability,ESInotonlyrepresentsamarkuplanguagebut,inaddition,specifies a protocol for transparent content management delivery.By providingthecapabilitytoassembledynamicpagesfromfragments,itbecomespossibletolimittheretrievalofdatafromacentralizedWebsitetononcacheableorexpiredfragments.ThiscapabilityreducestheloadonthecentralizedWebsite,therebyreducingpotentialcongestionaswellasenhancingdeliveryofdatatobrowserusers.

TheESImarkuplanguagerepresentsanXML-basedmarkuplan-guage, which was designed to improve end-user performance whilereducingtheprocessingrequirementsonservers.ESIincludesfourkeyfeatures.Thosefeaturesareinclusion,conditionalinclusion,environ-mentalvariables,andexceptionanderrorhandling.

4.3.3.2 Inclusion and Conditional Inclusion InclusionprovidestheabilitytoretrieveandincludefilestoconstructaWebpage,withuptothreelevelsofrecursioncurrentlysupportedbythemarkuplanguage.Eachfilecanhaveitsownconfiguration,includingaspecifiedtime-to-livevalue, thereby enabling Web pages to be tailored to site operatorrequirements.Incomparison,conditionalinclusionprovidestheabilitytoaddfilesbaseduponBooleancomparisons.

4.3.3.3 Environmental Variables A subset of standard CommonGatewayInterface(CGI)environmentalvariablesiscurrentlysupportedbyESI.ThosevariablescanbeusedbothinsideofESIstatementsandoutsideofESIblocks.

4.3.3.4 Exception and Error Handling Similar to HTML, ESI pro-vides the ability to specify alternative pages to be displayed in theeventacentral-siteWebpageordocumentisnotavailable.Thus,underESI,userscoulddisplayadefaultpagewhencertaineventsoccur.In

Page 177: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

158 aPraCtiCalguidetoContentdeliverynetworks

addition, ESI includes an explicit exception-handling statement setthatenablesdifferenttypesoferrorstogeneratedifferentactivities.

4.3.3.5 Language Tags Similar to HTML, the ESI specificationdefinesanumberoftags.Table 4.8summarizesthefunctionsofsevenkeyESItags.

4.3.3.6 The ESI Template Thebasicstructurethatacontentprovideruses to create dynamic content in ESI is referred to as a templatepage.Thatpage,whichisillustratedinFigure 4.15,containsoneormoreHTMLfragmentsthatareassembledtoconstructthepage.AsindicatedinFigure 4.15,thetemplateisformedthroughtheuseof

(1) Welcome to Popcorn.com

Summer Sale Clearance �e Bin(2)

(1) Act Quick Special

[special TTL = 15 m] [electronic TTL = 1 d]

Template

Fragments

Daily Electronic Sale

Figure 4.15 A sample ESI template page.

Table 4.8 ESI Tags

ESI TAG FUnCTIon

<esi: include> Include a separate cacheable fragment<esi: choose> Conditional execution under which a choice is made based on several

alternatives, such as a cookie value or URL; every “choose” must contain at least one “when” element and can optionally include only one “otherwise” element

<esi: try> Permits alternative processing to be specified in the event a request fails; valid children of “try” are “attempt” and “except”

<esi: vars> Allows variable substitution for environmental variables<esi: remove> Specifies alternative content to be removed by ESI but displayed by the

browser if ESI processing is not performed<esi… - -> Specifies content to be processed by ESI but hidden from the browser<esi: inline> Specifies a separate cacheable fragment’s body to be included in a template

Page 178: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 159

suchcommonelementsasavendorlogo,navigationbars,andsimilar“canned”staticelementsplusdynamic fragments.Theformedtem-platerepresentsafilethatisassociatedwiththeURLthatabrowseruserrequests.ThefileconsistsofHTMLcodethatismarkedupwithESItagsthatinformsthecacheserverordeliverynetworktoretrieveandincludepre-definedHTMLfragments,withthefileconstructedbycombiningHTMLandESItags.InexaminingtheESItemplatepageshowninFigure 4.15,notethattheWelcomelogo(1)andtext(2)representstaticcontentthatcanbepermanentlycached.Thetargetedadvertisements(3and4)representfragmentsthatcanonlybestoredincacheuntiltheirtimetolivevaluesexpire,afterwhichedgeserversmustretrievenewfragments.

If we examine the sample ESI template shown in Figure  4.15,wecanobtainanappreciation forhow theuseofESIconsiderablyfacilitatestheflowofdata.Inthisexample,theWebpageconsistsofstaticboilerplateintheformofavendorlogoandpageheadingsandnavigationbars that canbecontinuously cachedonanedge server.ThetwofragmentshaveTTLvaluesof15minutesand1day,respec-tively.Thus,onefragmentmustberetrievedfourtimesanhourfromthe central Web site; however, once retrieved the fragment can becacheduntiltheTTLvalueexpires.Thus,theedgeserveronlyhastoperiodicallyupdatethecontentsofthisfragmentthroughouttheday.Incomparison,thesecondfragmentisonlyupdatedonadailybasis.Thus,anedgeserveronlyneedstoretrieveonefragmentevery15 minutesandtheotherfragmentonadailybasisinordertokeeptheWebpageuptodate.

4.3.4 Edge Side Includes for Java

In addition toESI forHTML, an extension toESIprovides sup-port for Java.Referred to asEdgeSide Include for Java (JESI), itsuse makes it easy to program Java Server Pages (JSPs) using ESI.Asarefresher,JSPsrepresentserver-sidesoftwaremodulesthatareusedtogenerateauserinterfacebylinkingdynamiccontentandstaticHTMLthroughtags.Thus,throughtheuseofJESI,youcanfacili-tatetheuseofESItagswithinaJSPapplication.Table 4.9providesasummaryofJESItagsandtheirfunctions.

Page 179: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

160 aPraCtiCalguidetoContentdeliverynetworks

4.3.5 Statistics

Oneof themajor advantages associatedwith theuse of edge serversis the intelligenceprovidedbysomevendors intheformofstatisticalreports. For example, Akamai Site Wise reports provide informationaboutwhichsitepagesareviewed,howlongvisitorsremainonasite,theamountoftimevisitorsspendondifferentpages,andsimilarinforma-tion.Table 4.10listssomeoftheWebsitestatisticsprovidedbyAkamaiSiteWisereportsaswellasthepotentialutilizationofsuchreports.

AsindicatedinTable 4.10,suchreportsprovideatoolfortailoringmarketingandadvertisingresourcesaswellasprovidingorganizations

Table 4.9 JESI Tags

JESI TAG FUnCTIon

<jesi: include> Used in a template to indicate to the ESI processor the manner by which fragments are assembled to form a page

<jesi: control> Used to assign an attribute to templates and fragments<jesi: template> Used to contain the entire contents of a JSP container page within its body<jesi: fragment> Used to encapsulate individual container fragments within a JSP page<jesi: codeblock> Used to specify that a piece of code should be executed before any other

fragments being executed<jesi: invalidate> Used to remove and/or execute selected objects cached in an ESI processor<jesi: personalize> Used to insert personalized content into a Web page where the content is

placed in cookies that the ESI processor uses to insert into a page

Table 4.10 Akamai Sitewise Statistics and Potential Utilization

STATISTIC PoTEnTIAL UTILIzATIon

Most requested page Move popular content to home pageEliminate minimally used contentLeverage popularity of content

Routes from entry Identify points of entry to target advertisingPopular routes Determine if page organization is properly constructed for

visitor accessTransactions by product Determine online revenue drivers

Test price changes and compare product salesShopping-cart summary Track fulfillment versus abandonment by product

Contact customers that abandoned items with a special offerSearch-engine summary Determine which search engines refer customers to

target advertisingFirst time vs. returning customers Determine value of different marketing programs

Tailor content for repeat visitorsExamine purchase patterns

Page 180: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 161

withawindowintotheuseoftheirWebcontent.InformationabouttheaccessanduseofWebpagescanbeavaluableresourcefortailor-ingcontenttoreflectthepricesensitivityandproductrequirementsofbrowserusers,whichcanturnpagehitsintoshopping-cartfulfill-ments.Thisinturncanresultinincreasedrevenues.

4.3.6 Summary

AsindicatedbyourbrieftourofESI,thismarkuplanguagerepre-sentsthekeytomakingedgeoperationspractical.Byprovidingtheability to subdivideWebpages into fragments, it becomespossibletocachedynamicportionsofWebpagesonaperiodicbasis,therebyreducingtheamountoftrafficthathastoflowbetweenbrowserusersandacentralizedWebsite.

4.4   The Akamai HD Network

In concluding our discussion of Akamai, we will focus our atten-tion upon the efforts of this content delivery network provider tosupporttherapidlyevolvingrequirements forhigh-definition(HD)video.ReferredtoastheAkamaiHDNetwork,thiscompany’sedgeserversarebeingusedtosupportadaptivebit-ratestreamingvideoaswellasdigitalvideorecordertechnologyforhighdefinitionbasedonAdobe’sFlash,Microsoft’sSilverlight,andApple’s iPhone.Becausethestreamingofhigh-definitionvideoisbothstorageandbandwidthintensive,mostWebserverstypicallyusedVGA640×400graphics.However, beginning in 2007, several Web sites recognized thedemand for better-quality video andbegan toofferhigh-definitionvideo.Because theuseofhigh-definitionvideocan result in ratherlongbufferingdelays,slowstart-uptimes,andperiodicandannoy-ingdelays,itmadesensetoaddhigh-definitionvideotoedgeserversonmanycontentdeliverynetworks.Akamai,amongotherproviders,has respondedbymarketingitsHDNetworktechnologyasamecha-nismtomoveHDcontentclosertothebrowseruser.

ToobtainanappreciationforhowtheAkamaiHDnetworkoperates,let’sturnourattentiontothemannerbywhichaWebsitewouldusetheAkamainetworktosupportdynamicstreamingforFlashoverHTTP.

Page 181: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

162 aPraCtiCalguidetoContentdeliverynetworks

4.4.1 Using the HD Network with Flash

WhenusingFlashoverHTTP,eachvideofilewillhavetobeencodedatseveraldifferentbitratesorqualitysettingstosupportmultibit-rateoutputs.Similarly, ifdesired, audiofiles canbe encodedat separatebitrates.TheFlashplayeroneachclientcontrolstheviewingprocessbaseduponheuristicsbuiltintotheplayerthatarecustomizablebytheclient,althoughmostclientssimplyusedefaultsettings.Asanexampleofcustomization,clientscanprogramtheirplayertocommenceopera-tionsatalowbitrate,whichenablesimmediateviewing.Asviewingcontinues,theplayer’sbufferfills,andtheplayercanswitchitsoper-atingmodetoahigherbitrate.Bymonitoringthebufferoccupancylevel,bandwidth,andframeloss,theplayercanthendetermineifitshouldchangeitsoperationalmodetosupportadifferentbitrate.

WhentheFlashplayerswitchesitsbitrateandgoesthroughtheAkamaiHDnetwork,itsendsarequesttoanedgeserver.Theserverthenswitchesthebitrateatthenextkeyframeforvideoornextaudiosample,oracombinationofthetwo.Here,thetermkeyframedefinesthe starting and ending points of any smooth transition. They arecalledframesbecausetheirpositionintimeismeasuredinframesonastripoffilm,whichinthedigitalworldrepresentsasequenceofframesdisplayedonamonitor.Asequenceofkeyframesdefineswhichmove-ment a viewerwill see,whereas thepositionof the keyframeson avideodefinesthetimingofthemovement.Becauseonlytwoorthreekeyframesoverthespanofasecondortwodonotcreatetheillusionofmovement,theremainingframesarefilledwithin-betweenframes.Theideaistocreatemorethanonekeyframe,andthentosetthedesiredeffectvaluesateachkeyframe.Forexample,AdobePremierewillcre-ateagradualchangeinvaluesbetweenkeyframes,whichisreferredtoasinterpolation.Asanexampleoftheuseofkeyframes,youcouldcreateakeyframewherethevolumeis−20dBandanotherkeyframe5secondslaterwherethevolumeis0dB.Then,AdobePremierewillinterpolatethistocreateasmooth5-secondvolumeincrease.

YoucanviewandworkwithkeyframeswhenusingAdobePremierebyusingeither theTimelineor theEffectControlswindow.WhenusingTimeline,keyframescanbedisplayed in the timelinewhenavideo track is expanded. When using the Effect Controls window,theright-handsideofthewindowrepresentsaminiaturetimelinefor

Page 182: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 163

theselectedclip,whichshowskeyframesasdiamondicons.Thisviewallowsyoutoviewkeyframeswhileprovidingyouwiththeabilitytomanage theircontrol.Returningourattentionto theedgeserver inthecontentdeliverynetwork,astheedgeserverswitchesitsdeliverybitrate,theclientplayerwilltemporarilyusethedatainitsbuffer.Thisactionenablesthebit-ratechangetoappeartobeseamlesstotheclient.

4.4.1.1 Selecting the Client Population Oneofthekeystosuccessistocarefullydetermineyourpotentialaudienceandadjustyourstream-ing accordingly. For example, if your organization is in the busi-ness of providing movies for air travelers that will be downloadedontoportablegadgetswithscreens less than12 inches indiameter,itwillnotbeadvantageoustooffer1080pvideo,sincebothstorageandtransmissiontimewouldbeextensive.Similarly,ifyourWebsiteoffersdownloadstobeviewedonflat-screenTVsaswellasportablegadgets,youwouldmorethanlikelyoffermultipleversionsofvideobaseduponamixofparametersthatcanbeuserselected.

4.4.1.2 Selecting Bit Rates Adobeoffersreadersacalculatorfordeter-miningencodingbitratesforagivenframesize.Thatcalculatorcanbe viewed at http://www.adobe.com/devnet/flash/apps/flv_bitrate_calculator/index.html.Youcaneitherusethecalculatorordetermineanapproximationbyusingthefollowingformula:

Baselinebitrate(kbps)=(frameheight×framewidth×framerate)

/motionfactor/1024

wherethemotionfactoris

7forhigh-motion,high-scenechanges15forstandardmotion20for lowmotion, suchasa talkinghead,wheremovement

islimited

4.4.1.3 Selecting Frame Sizes WhenusingFlashinyourdevelopmenteffort,it’simportanttorecognizethattheencoderoperatesbydividingtheviewingwindowintomultiple16×16-pixelmacroblocks.Thosemacroblocksarecomparedtooneanotherforcompressionpurposes.

Page 183: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

164 aPraCtiCalguidetoContentdeliverynetworks

Thus,whenselectingtheframesize,youshouldattempttousewidthsandheightsthataredivisibleby16.Ifthisisnotpossible,thenextbestdivisorwouldbe8,followedby4,whichispreferabletoanythingelse.Thus,Table 4.11listsfivepopularframesizesthataredivisibleby16.If theframesizeslistedinTable 4.11donotmeetyourrequirements,youcanthenselectaframesizedivisibleby8,suchas1152×648downto128×72,oraframesizedivisibleby4,suchas1216×684downto192×108.Ifyourequireacustomizedsize,youcanthenselectaframesizewhosewidthandheightarenotdivisiblebyamultipleof4;however,thiswillresultinsomeprocessingdelaybaseduponthemannerbywhichtheencoderusesmultipleblocksof16pixels.

4.4.1.4 Profiles Aprofiledefinesthetechniqueusedbytheencoderand decoder. Flash supports three profiles referred to as Baseline,Medium, andHigh.Thebaselineprofile requires the least amountofprocessingpower,whilethehighprofilerequiresthemost.IfyourpotentialaudiencewillbeusingIntelAtomorsimilarprocessors,youwouldprobablyconsiderusingabaselineprofile.

Table 4.11 Frame Sizes Divisible by 16

1152 × 7681280 × 7201024 × 576

768 × 512768 × 432720 × 480640 × 480576 × 432512 × 384528 × 352512 × 288480 × 320448 × 336384 × 288320 × 240256 × 192256 × 144240 × 160192 × 144128 × 96

Page 184: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnmodel 165

4.4.1.5 Levels The level defines the maximum resolution and bitrate. When working with personal computers with Flash, you canmore than likely ignore the levels, since you can set the resolutionindependently,whichineffectsetsthelevel.

4.4.1.6 Keyframes OneofthelimitationsofFlashisthatitcanonlychangebitratesatkeyframeintervals.Thismeansthatyouneedtocarefully consider the intervals at which keyframes appear in yourvideo.Forexample,ifyouplacekeyframestoofarapart,yourvideowillreactratherslowlytochanges.Incomparison,ifyouspacekey-frames closer than two seconds apart, your video will react morequicklytobit-ratechanges.Ingeneral,youshouldconsiderkeepingkeyframesspacedat2-secondintervalsorlessforhigh-bit-rateappli-cations.Forlowerbit-rateapplications,keyframeratesuptooneper3 or4secondscanbeconsidered.

Page 185: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 186: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

167

5cachIng and load BalancIng

Inpreviouschaptersinthisbook,ourprimaryfocuswasuponunder-standingthebenefitsofcontentdelivery;reviewingtherelationshipsbetween Web servers, application servers, and back-end databaseservers; and examining how a content delivery provider, such asAkamai,structuresitsnetworkandusesamarkuplanguagetofacili-tate data delivery. While our prior explanation of content deliverynetwork(CDN)operationswasofsufficientdetailtoprovidereaderswithafirmunderstandingofsuchoperations,twokeycontentdeliv-eryfunctionswereglossedover.Thosefunctionsarecachingandloadbalancing,bothofwhichwillbecoveredinmoredetailinthischapter.

Therationaleforcachingandloadbalancingbeingpresentedinthisfifthchapterisbaseduponthestructureofthisbook.Thefirstthreechapters provided a solid review of the rationale for content deliv-eryandthe interrelationshipofWebrequestsandserver responses,whileChapter4wasfocusedupontheuseofaCDNserviceprovider.If youuseaCDNserviceprovider,thatproviderwillmorethanlikelyperformcachingandloadbalancingtransparently.However,ifyourorganizationhasmorethanoneWebserveror ifyourorganizationdecides that the enterprise should perform content delivery, thencachingandloadbalancingneedstobeconsidered.Inthischapter,wewillfocusourattentionuponbothtopics,examininghowcachingandloadbalancingoperateaswellastheadvantagesanddisadvan-tagesassociatedwitheachtechnology.

5.1   Caching

Cachingrepresentsatechniqueinwhichinformationthatwaspre-viouslyretrievedisheldinsometypeofstoragetofacilitateasub-sequent request for the same information. There are two primaryreasonsbehindtheuseofcachinginaclient–serverenvironmenton

Page 187: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

168 aPraCtiCalguidetoContentdeliverynetworks

theInternet.First,cachingwillreducedelayorlatency,asarequestfordataissatisfiedfromacachethatislocatedatorclosertotheclientthantheservertheclientisaccessing.Secondly,cachingreducesnet-work traffic. This reduction in network traffic occurs because datathatiscachedflowsfromthecachetotheclientinsteadofflowingfromtheserver, reducingthe lengthaswellas thedurationof thedataflow.

Cache storagecan include randomaccessmemory (RAM),flashmemory,disk,orevenacombinationofdifferent typesofmemory.Thereareseveraltypesofcacheswheredatacanbetemporarilystored,ranging fromauser’sbrowser to the serveraswell asotherdevicesalongtherequest/responsepath.Toobtainanappreciationofcaching,let’sturnourattentiontothedifferenttypesofcachesthatcanbeusedtoexpeditethedeliveryofdata,commencingwiththebrowsercache.

5.1.1 Browser Cache

Earlier in thisbook,whenwediscussed theoperationofbrowsers,wenotedthatabrowsercacheresulted inpreviouslyretrievedWebpages being stored on disk. Depending upon the settings in effectforyourWebbrowser,yourbrowsercouldcheckforanewerversionofstoredWebpagesoneverypage-retrievalrequest,everytimeyoustartMicrosoft’sInternetExplorer,automatically,ornever.Heretheselectionof “everypage” results in thebrowsercheckingwhetheracopyofthepagetobeviewediscached,while“everytimeyoustartMicrosoft’sInternetExplorer”meansthatthebrowsercheckstoseeifacopyofthepagetobeviewedwasputincacheonthecurrentday.IfyouconfigureInternetExplorerforoneofthefirstthreeoptions,a request for aWebpagewill result in thebrowser comparing theparametersoftheWebpage,suchasitscreatedandmodifieddatesandfilesizetoanypreviouslystoredpageincache.Ifthepropertiesoftherequestedandstoredpagedonotmatch,thebrowserwillthenretrieveanewcopyofthepage.Obviously,ifyouselectedthe“never”option,thebrowserswouldnotcheckthepropertiesoftherequestedpage.Instead,itwoulddisplaythecachedversionofthepage.

TheprimarypurposeofabrowsercacheistoprovideamoreefficientmethodforretrievingWebpages.Thatis,insteadofhavingtoretrieveapreviously retrievedWebpage fromthe Internet, thepagecanbe

Page 188: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 169

displayedfromcache.Thisisnotonlymoreefficient,butinadditionminimizeslatencyordelaywhilereducingtrafficflowontheInternet.

Youcanappreciatetheusefulnessofabrowsercachewhenyouclickonthebrowser’s“back”buttonoronalinktoviewapageyourecentlylookedat.Insuchsituations,theuseofthebrowsercacheresultsinthenearinstantaneousdisplayofaWebpage.Incomparison,ifcacheisnotused,thedelaytodisplayapagecanbeaslongas20ormoresecondswhenthepagecontainsa lotof imagesandisretrievedviadialup.Even ifyouareconnected to theInternetviaDSLorcablemodem,delays canbe relatively annoyingwhenyouflip throughaseriesofscreensifyourbrowserisconfiguredtodisablecaching.

Figure 5.1illustratestheuseoftwodialogboxesfromtheFirefoxWeb browser. Both dialog boxes result from the selection of theOptionentryintheToolsmenu.TheleftdialogboxlabeledOptionsshowstheselectionoftheNetworktab,illustratingthatbydefaulttheversionofFirefoxusedbythisauthorallocatedupto50Mbofdiskspaceforthecache.Byclickingonthebuttonlabeled“Settings,”thedialogboxon the right labeled “ConnectionSettings” isdisplayed.Throughtheuseofthisdialogbox,youcouldconfigureyourbrowserto access a proxy server, which more than likely will have its owncacheandwhichwewillshortlydiscuss.

5.1.2 Other Types of Web Caches

Inadditiontobrowsercaches, thereareseveralothertypesofcacheswhoseoperationdirectlyaffectsthedeliveryofWebcontent.Thoseaddi-tionalcachesincludeproxycaches,gatewaycaches,andservercaches.

5.1.2.1 Proxy Caches Web proxy caches operate very similar tobrowser caches; however, instead of providing support for a singlecomputer, they are designed to support hundreds to thousands ofcomputers.Thus,youcanviewaproxyasalarge-scalebrowserwithrespecttoitscacheoperation.

Aproxy server typically resideson theedgeof anorganization’snetwork,usuallybehindtherouterthatprovidesconnectivitytotheInternet.Theproxycachecanoperateasastand-alonedevice,oritsfunctionalitycanbeincorporatedintoanotherdevice,suchasarouterorfirewall.

Page 189: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

170 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 5

.1

Thro

ugh

the

use

of th

e To

ols

men

u, yo

u ca

n co

nfigu

re th

e am

ount

of s

tora

ge fo

r cac

he a

s we

ll as

the

use

of v

ario

us p

roxie

s.

Page 190: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 171

Figure  5.2 illustrates the relationship between browser users, astand-alone cache, and a router connected to the Internet. In orderfor browser users to effectively use the services of a proxy cache,theyneed toaccess thecache.Todoso,browseruser requestshavetoberoutedtotheproxy.Onewaytoaccomplishthisistouseyourbrowser’s proxy settings to tell the browserwhat services should beroutedtotheproxy.IfyouareusingInternetExplorer,youwouldgotoTools>InternetOptions>ConnectionsandselectthebuttonlabeledLAN Settings. This action will result in the Local Area Network(LAN)Settingsdialogboxbeingdisplayed,asillustratedintheleftportionofFigure 5.3.IfyouclickontheProxyServerbox,youwouldthenbeabletoentertheaddressandportnumberoftheproxy.Asanalternative, you could click on the button labeled Advanced if yourorganizationoperatesmultipleproxyserversorifyouwishtodefineexceptionstotheuseofaproxyserver.TherightportionofFigure 5.3illustratestheProxySettingsdialogbox,whichenablesyoutodefinemultipleproxyserversaswellasexceptionstotheuseofaproxyserver.

In comparing browser caches to proxy caches, you can view theproxycacheasatypeofsharedbrowsercache,sincealargenumberofbrowserusersareservedbytheproxy.Similartoabrowsercache,theuseofaproxycachereducesbothdelayorlatencyandnetworktraffic.BecausetheproxycachestoresWebpagesrequestedbymanybrowserusers,theyaremoreefficientthanindividualorevenaseriesofindi-vidualbrowsercaches.This isbecause theproxycache storespagespreviouslyaccessedbyalargegroupofusers.Thisallowsoneuserto

Proxycache server

Router

Workstations

Figure 5.2 A proxy cache server supports a number of browser users.

Page 191: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

172 aPraCtiCalguidetoContentdeliverynetworks

beabletohavehisorherrequestforaWebpagefulfilledbytransmit-tingaWebpagepreviouslycachedduetotheactivityofanotheruserwhovisited the locationnowrequestedby thesubsequentuser.Forexample,assumeseveralemployeesofanorganizationwishtoaccesstheirYahoomailaccountsduringtheday.AstheypointtheirbrowsertoYahoo.com,theywouldmorethanlikelyobtaintheinitialYahoo!pagefromaproxycache,whichwouldexpeditetheirinitialaccess.

5.1.2.2 Gateway Caches Agatewaycachecanbeconsideredtorepre-sentareverseproxycache.ThisisbecauseagatewaycacheistypicallyinstalledbyaWebmastertomaketheWebsitemorescaleable.In com-parison,aproxycacheiscommonlyinstalledbyanetworkmanagertoconservebandwidth.Inaddition,fromanetworkperspective,theproxycacheresidesinfrontofanetworkusedbyagroupofbrowserusers,whilethegatewaycacheresidesinfrontofaWebserverusedbydistributedbrowseruserswhoseaccessrequestscanoriginatefromdifferentnetworks.

Whileagatewaycache is commonly installedataWebsite, thisdevicecanalsobedistributedtootherlocations,withaloadbalancerused to route requests from a Web server to an individual gatewaycache. When gateway caches are distributed across the Internet,in effectyouobtainacontentdeliverynetwork(CDN)capability.Thus,amongtheprimaryoperatorsofagatewaycacheareCDNoperators,suchasAkamai.

Figure 5.3 Internet Explorer as well as other browsers support the use of proxy servers for one or more Web applications.

Page 192: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 173

5.1.2.3 Server Caches Server cache is a function of the operatingsystem used, the hardware platform, and the application programthat provides a Web-site capability. The operating system typicallystores in RAM previously accessed files, with the number of filesstoredafunctionofavailablememoryaswellasthesizeofeachfile.Incomparison,theapplicationprogramthatprovidesaWeb-servercapabilitymayallowyou tocacheanumberofpopularly requestedWebpagesinRAM,suchasthehomepagethatisretrievedwhenabrowseruseraccessesyoursite.

5.1.3 Application Caching

One example of application caching occurs through the use ofASP.net, a Microsoft server-side Web technology used to createWebpages.Essentially,ASP.nettreatseveryelementinanASP.netpageasanobject,compilingthepageintoanintermediatelanguage.Then,a just-in-time(JIT)compilerconvertstheintermediatecodeintonativemachinecodethatisthenexecutedonthehost.Becausethe code is directly executed by the host, pages load faster thanconventionalASPpages,whereembeddedVBScriptorJscripthadtobecontinuouslyinterpreted.

UnderASP.net, frequently accessedpages canbe cached via theuseofdirectiveslocatedatthetopofeachASPXfile.PagedeveloperscandeclareaspecificASPXpageforcachingbyincludingtheOutputCachedirectiveatthetopofthefile.ThefollowingexampleillustratestheformatoftheOutputCachedirective.

<%@output Cache Duration=“of seconds” Location=“Any|Client|Downstream|Server|none” VaryByCustom=“browser|customstring” VaryByHeader=“headers” VaryByParam=“parametername” %>

Theabovecodingisanexampleofenablingcachingforapagedeclar-atively.Youcanalsoenablecachingprogrammaticallyinapage’scodebymanipulatingtheHttpCachePolicyobject.Bothmethodsworkthesamewayforbasiccaching.

Thefirstattribute,Duration,specifieshowlonginsecondstocachea Web page. Once a page is generated, ASP.net will place it into

Page 193: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

174 aPraCtiCalguidetoContentdeliverynetworks

cache.Then,untilthedurationisreached,subsequentrequestsforthesamepagewillbeservedfromcache.Oncethespecifieddurationisreached,thepageisdiscarded.However,thenextrequestforthepageresultsinitsgenerationandplacementintocache,startingtheprocessoveragain.

Thesecondattribute,Location,enablesyoutospecifywhere thecached Web page resides. The default setting of Location=“Any”caches the page on the client that originated the request, on theWebserverthatreceivestherequest,oronanyproxyserverslocatedbetween the client and server that supportHTTP1.1caching.Theremaining location attributes permit caching to occur at specificareas.Forexample,Location=“Client”forcesthepagetobecachedinthebrowser;Location=“Server”resultsinthepagebeingstoredintheWebservercache;Location=“ServerAndClient”usestheWebserveror browser cache; and Location=“Downstream” results in the pagebeingstoredanywhereotherthantheclientbrowser.

TheVaryBy attributes can be used to cache different versions ofthe samepage.Differences inWebpages can result fromdifferentclient browsers, the use of different query strings or form-contentparameters, and different HTTP header values. For example,if  your Web site provides several popular products whose descrip-tionsaredisplayedviaclientsreturningProductId27and54,ifyouspecifyVaryByParam=“ProductId”,ASP.netprocesses thepageandcachesittwice,onceforProductID=27andonceforProductID=54.In comparison,withouttheuseofthisattribute,onlyoneversionofthepagewouldbe cached.As indicatedby this brief tour ofASP.netcaching,itprovidesavaluabletechniquetoreusepreviouslyper-formedprocessingthatwasusedtocreateWebpagesandapplyitinsubsequentrequestsforthesamedata.Nowthatwehaveanappre-ciationforthefourcommontypesofWebpagecaches,let’sturnourattentiontohowcachesoperate.

5.1.4 Cache Operation

Regardlessofthetypeofcache,eachcacheoperatesaccordingtoasetofrulesthatdefineswhenthecacheservicesaWebpagerequest.Someof therulesaresetbyanadministrator, suchas thebrowseroperatorortheproxyadministrator.Otherrulesaresetbytheprotocol,suchas

Page 194: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 175

HTTP 1.0andHTTP1.1.Ingeneral,acacheexaminestrafficflowcontentandcreatesacacheentrybasedupontherulesitfollows.Table 5.1listssomeofthemorecommonrulesfollowedbyacache.AswereviewHTMLMETAtagsandHTTPheaders,wewillnotehowtheruleslistedinTable 5.1canbeappliedtotrafficthatthecacheexamines.

In examining the entries in Table  5.1, note that the term freshmeansthatdataisavailableimmediatelyfromcache.Inaddition,anage-controllingheaderoratimerprovidesthemechanismtodeter-mineifdataiswithinthefreshperiod.

5.1.5 Cache Control Methods

The most common methods used to control the manner by whichcaching operates is through the use of HTML META tags andHTTPheaders.Asareviewforsomereaders,METAtagsareHTMLtagsthatprovideinformationdescribingthecontentofaWebpage.However,unlikeHTMLtagsthatdisplayinformation,datawithinaMETAtagisnotdisplayed,atermreferredtoasnonrendered.

5.1.5.1 META Tags METAtagsareoptional,andmanyWebpagedevelopersdonotusesuchtags.However,becausetheMETAtagsareusedbysearchenginestoenablethemtomoreaccuratelylistinfor-mationaboutasiteintheirindexes,theuseofthistypeofHTMLtaghasgrowninpopularity.

5.1.5.1.1 Types of META Tags TherearetwobasictypesofMETAtags: HTTP-EQUIV tags and META tags that have a NAMEattribute. META HTTP-EQUIV tags are optional and are the

Table 5.1 Common Cache Rules

Examine response headers to determine if data should be cached. If the header indicates data should not be cached, it is not cached.

If the response does not include a validator such as a last-modified header, consider data to be uncacheable.

If the request is secure or authenticated, do not cache data.Consider cached data to be fresh if Its age is within the fresh period A browser cache previously viewed the data and has been set to check once per session A proxy cache recently viewed the data and it was modified a relatively long time agoIf data is stale, the origin server will be asked to validate it or inform the cache if the copy is still valid.

Page 195: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

176 aPraCtiCalguidetoContentdeliverynetworks

equivalent of HTTP headers. Similar to normal headers, METAHTTP-EQUIVtagscanbeusedtocontrolordirecttheactionsofWebbrowsersinamannersimilartonormalheaders.Infact,someservers automatically translate META HTTP-EQUIV tags intoHTTPheaderstoenableWebbrowserstoviewthetagsasnormalheaders.OtherWeb-serverapplicationprogramsemploya separatetextfilethatcontainsMETAdata.

The second typeofMETA tag,which is also themorepopulartypeof tag, isaMETAtagwithaNAMEattribute.METAtagswithaNAMEattributeareusedforMETAtypesthatdonotcor-respondtonormalHTTPheaders,enablingspecializedinformationtobeincorporatedintoaWebpage.

5.1.5.1.2 Style and Format METAtagsmustappearintheHEADofanHTMLdocumentandarenormallyinsertedatthetopofadoc-ument,usuallyafterthe<TITLE>element.TheformatofaMETAtagisshownasfollows:

<METAname=“string1”content=“string2”>

InexaminingtheMETAtagformat,notethatyoudonotneedtohavea</META>attheendofthetag.Table 5.2providesanalpha-beticallyorderedlistofsomeofthemajortypesofMETAtagsanda brief description of their potential utilization. To obtain a betterappreciationfortheuseofMETAtags,let’sexamineafewexamples.SupposethecontentofaWebpageshouldexpireonJuly19,2012,at 2p.m.intheafternoon.Then,wewouldusetheExpiresMETAtagasfollows:

<METAname=“Expires”content

= “Thu,19Jul201214:00:00 GMT”>

ItshouldbenotedthattheExpiresMETAtagusesdatesthatcon-formtoRFC1123.

ForasecondexampleoftheuseofMETAtags,let’sturnouratten-tiontotheRefreshMETAtag.Theuseofthistagprovidesamecha-nismtoredirectorrefreshuserstoanotherWebpageafteradelayofaspecifiednumberofsecondsoccurs.BecausetheRefreshMETAtagisusedwithinanHTTP-EQUIVtag,let’sfirstexaminetheformatofthelatter.Thisisshownasfollows:

Page 196: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 177

<METAHTTP-EQUIV=“varname”content=“data”

Note that the use of the HTTP-EQUIV tag binds the variablename (varname) to an HTTP header field. When the varname is“Refresh,”theHTTP-EQUIVtagcanthenbeusedintheHEADsection of an index.html file to redirect a browser user to anotherlocation.Forexample,toredirectabrowserusertowww.popcorn.comaftera5seconddelay,youcouldcodethefollowingtag:

<METAHTTP-EQUIV=“Refresh”content=“5;

url=www.popcorn.com”>

Because we are discussing caching, we will conclude our briefreviewofMETAtagswiththetagusedtoinformbrowsersandotherproductsnottocacheaparticularWebpage.Thattagisthe“pragma”tag,whoseuseisshownasfollows:

<METAHTTP-EQUIV=“pragma”content=“no-cache”>

WhileMETAtagsarerelativelyeasytouse,theymaynotbeeffec-tive.Thisisbecausetheyareonlyhonoredbybrowsercachesthatactu-allyreadHTMLcode,butmaynotbehonoredbyproxycachesthatvery rarely readsuchcode. Incomparison, trueHTTPheaderscanprovideyouwithasignificantamountofcontroloverhowbothbrowserandproxycacheshandleWebpages.Thus,let’sturnourattentiontoHTTPheadersandhowtheycanbeusedtocontrolcaching.

Table 5.2 Basic Types of META Tags

TAG DESCRIPTIon

Abstract Provides a one-line overview of a Web pageAuthor Declares the author of a documentCopyright Defines any copyright statements you wish to disclose about your documentDescription Provides a general description about the contents of a Web pageDistribution Defines the degree of distribution of the Web page (global, local, or internal)Expires Defines the expiration date and time of the document being indexedKeywords Provides a list of keywords that defines the Web page; used by search engines to

index sitesLanguage Defines the language used on a Web pageRefresh Defines the number of seconds prior to refreshing or redirecting a Web pageResource type Defines the type of resource for indexing, which is limited to the documentRevisit Defines how often a search engine should come to a Web site for reindexingRobots Permits a Web site to define which pages should be indexed

Page 197: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

178 aPraCtiCalguidetoContentdeliverynetworks

5.1.5.2 HTTP Headers ThroughtheuseofHTTPheaders,youcanobtainafairlysubstantialamountofcontrolconcerningthemannerbywhichbrowserandproxiescachedata.AlthoughHTTPheaderscannot be viewed in HTML and are typically generated by Webservers,it’spossibletocontrolsomeoftheirusebasedupontheserverbeingaccessed.Thus,inexaminingHTTPheaders,wewillprimarilyfocusourattentionuponcachecontrolheadersandtheirutilization.

5.1.5.2.1 Overview Currently,thereareapproximately50HTTPheaders defined in the HTTP/1.1 protocol that can be subdividedintofourcategories:entity,general,request,andresponse.Anentityheadercontainsinformationaboutanentitybodyorresource,whileageneralheadercanbeusedinbothrequestandresponsemessages.A requestheader is included inmessages sent fromabrowser to aserver,whilearesponseheaderisincludedintheserver’sresponsetoarequest.BecauseHTTPheadersaretransmittedbyaserverpriortosendingHTML,theyflowtoanyintermediatedevices,suchasaproxycachethatcanoperateuponthecontentsoftheheaderaswellasthebrowser.TheHTTPheaderdataareonlyviewedbyabrowserandarenotdisplayedbythebrowser;thustheyaretransparenttotheuser.

ThefollowingillustratesanexampleofanHTTPresponseheadertransportedunderHTTP/1.1.Inexaminingtheheaderentries,notethattheDateheaderisusedtospecifythedateandtimethemessageoriginatedandrepresentsageneraltypeofheader.

HTTP/1.1200okDate:Fri,20Jul201214:21:00GMTServer:CERN/3.1Cache-Control:max-age=7200,must-revalidateExpires:Fri,20Jul201216:21:00GMTLast-Modified:Mon,16Jul201212:00:00GMTETag:“4f75-316-3456abbc”Content-Length:2048Content-Type:text/html

TheHTTPheaderseenhere—sentbytheserverinFigure 5.4—representsaresponse-typeheaderthatprovidesinformationaboutthesoftwareusedbytheservertorespondtoarequest.Becauserevealinga specific software version can provide hackers with the ability to

Page 198: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 179

matchvulnerabilitiesagainstaserver,manytimesonlybasicinforma-tionisprovidedbythisheader.

Of particular interest to us is the cache-control header, as thisheaderspecifiesdirectivesthatmustbeobeyedbyallcachingmecha-nismsalongtherequest/responsepath.Thus,intheremainderofthissection,wewillfocusourattentionuponthisheader.However,priortodoingso,weneedtodiscusstheExpiresHTTPheader,asitsusecontrolscachefreshness.

5.1.5.2.2 Expires Header The Expires HTTP header providesa mechanism that informs all caches on the request/response pathofthefreshnessofdata.BecausetheExpiresheaderissupportedbymostcaches,italsoprovidesamechanismtocontroltheoperationofallcachesalongapath.OncethetimespecifiedintheExpiresheaderisreached,eachcachemustthencheckbackwiththeoriginservertodetermineifthedatahavechanged.

TheExpiresHTTPresponseheaderscanbe setbyWebserversin severalways.Web servers can set an absolute expiration time, atimebaseduponthelasttimethataclientaccessedthedata,oron

Figure 5.4 A typical HTTP/1.1 response header.

Page 199: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

180 aPraCtiCalguidetoContentdeliverynetworks

atimebasedonthelasttimedatachanged.BecausemanypartsofaWebpagecontainstaticorrelativelystaticinformationintheformofnavigationbars,logos,andbuttons,suchdatacanbecomecacheablebysettingarelativelylongexpirytime.

AsindicatedinFigure 5.4,theonlyvalidvalueinanExpiresheaderisadate inGreenwichMeanTime(GMT).AlthoughtheExpiresHTTPheaderprovidesabasicmechanismforcontrollingcaches,tobeeffectiveeachcachemusthaveaclocksynchronizedwiththeWebserver.Otherwise,cachescouldincorrectlyconsiderstalecontentasbeingfresh.Thus,alimitationassociatedwiththeuseoftheExpiresheader is the fact that theclockson theWebserverand thecacheneedtobesynchronized.Iftheyhavedifferentsettings,theintendedresultsfromthesettingoftheExpiresheadermaynotbeachieved,andacachecouldincorrectlyconsiderstalecontentasbeingfreshandcacheable.AsecondproblemassociatedwiththeuseoftheExpiresheaderisthefactthat,onceused,itwilleventuallyexpire.ThismeansthatitispossibletoforgettoupdateanExpirestime,whichwillthenresult in every request flowing to the Web server instead of beingcached,therebyincreasingthedelayorlatencyexperiencedbybrowserusersaswellasincreasingbandwidthutilization.

5.1.5.3 Cache-Control Header The cache-control general headerfield within the HTTP/1.1 header specifies the manner by whichall caching mechanisms along the request/response path shouldoperate. The cache-control header includes one or more requestand responsedirectives that specify themannerbywhich cachingmechanismsoperateand,typically,overridedefaultcachingsettings.Our discussionofcache-controlismainlyapplicabletoHTTP/1.1,asHTTP/1.0cachesmayormaynotimplementcachecontrol.

Table 5.3 indicates thecache-controlheader formatanddirectivesavailableunderHTTP/1.1forthecache-controlheader.Throughtheuseofthecache-controlheader,aclientorservercantransmitavarietyofdirectivesineitherarequestorresponsemessage.Suchdirectiveswillcommonlyoverride thedefault cachingalgorithms inuse.Note thatcachedirectivesareunidirectional,inthatthepresenceofadirectiveinarequestdoesnotmeanthatthesamedirectivehastobeincludedintheresponse.Alsonotethatcachedirectivesmustalwaysbepassedthrough devices along the request/response path to the destination,

Page 200: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 181

evenifanintermediatedevice,suchasaproxycache,operatesuponadirective.Thisisbecauseadirectivecanbeapplicabletoseveraltypesofcachesalongtherequest/responsepath,andatthepresenttimethereisnomethodtospecifyacachedirectiveforaspecifictypeofcache.

Cachecontrolcanbesubdividedintorequestandresponseheaders.Eachcache-controlheadercanincludeoneormoredirectivesthatdefinethe manner by which cache control should operate. Cache-controlrequestsweresupportedunderHTTP1.0,whileHTTP1.1introducedcache-control responseheaders,whichprovideWebsiteswithmorecontrolovertheircontent.Forexample,

Cache-control:max-age=7200,must-revalidate

IncludedinanHTTPresponse,thisheaderinformsallcachesthatthecontentthatfollowsisconsideredfreshfortwohours(7200seconds).In addition, the must-validate directive informs each cache alongtherequest/responsepaththattheymustcomplywithanyfreshnessinformationassociatedwiththecontent.

Table 5.3 Cache-Control Header Format and Directives

cache-control = “cache-control” “;” 1#cache-directivecache-directive = cache-request-directive | cache-response-directivecache-request-directive = “no-cache” | “no-store” | “max-age” “-“ seconds | “max-stale” “=” seconds | “min-fresh” “{=” seconds | “no-transform” | “only-if-cached” | cache-extensioncache-response-directive = “public” | “private” [“=” <”> 1#field-name <”] | “no-cache” [ “=” <”> 1#field-name <”] | “no-store” | “no-transform” | “must-revalidate” | “proxy-revalidate” | “max-age” “=” seconds | “s-maxage” “=” | cache-extension

Page 201: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

182 aPraCtiCalguidetoContentdeliverynetworks

5.1.5.4 Directive Application Prior todescribing the functionof thedirectives associated with the cache-control header in detail, a fewwordsare inorderconcerning theapplicationofadirective.Whenadirective appearswithout any1#field-nameparameter, thedirec-tivethenisapplicabletotheentirerequestorresponse.Ifadirectiveappearswitha1#field-nameparameter,itisthenapplicableonlytothenamedfieldorfieldsandnottotheremainingrequestorresponse.

Nowthatwehaveanappreciationfortheapplicabilityofdirectives,let’sturnourattentiontothefunctionofthespecificdirectivesthatcanbeincludedinacache-controlheader.Indoingso,wewillfirstexaminethecache-requestdirectiveslistedinTable 5.3.

5.1.5.5 Cache-Request Directives InexaminingtheentriesinTable 5.3,you will note that there are seven distinct cache-request directivesaswellasamechanismthatpermitsthecache-controlheadertobeextended.Thelatteroccursthroughtheuseofassigningatokenorquotedstringtocacheextension.Aswereviewtheoperationofeachcache-requestdirective,wewillalsodiscuss,whenapplicable,itsuseasacache-responsedirective.

5.1.5.5.1 The No-Cache Directive Thepurposeoftheno-cachedirec-tiveistoforcecachesalongtherequest/responsepathtosubmitarequestto theoriginserver forvalidationprior to releasingacachedcopyofdata.Thisfunctioncanbeusedtomaintainfreshnesswithoutgivingupthebenefitsofcaching.Inaddition,thisdirectivecanbeusedalongwiththepublicdirectivetoensurethatauthenticationisrespected.

Iftheno-cachedirectivedoesnotincludeafieldname,thisforcesacachetousetheresponsetosatisfyasubsequentrequestwithoutasuccessfulrevalidationwiththeserver.Iftheno-cachedirectivespeci-fiesatleastonefieldname,thenacachecanusetheresponsetosatisfyasubsequentrequest.Inthissituation,thespecifiedfieldname(s)arenotsentintheresponsetoasubsequentrequestwithoutasuccessfulrevalidationof theorigin server, enabling the server toprevent theuseofcertainHTTPheaderfieldsinaresponsewhileenablingthecachingoftherestoftheresponse.

5.1.5.5.2 The No-Store Directive In comparison to the no-cachedirective,theno-storedirectiveisrelativelysimple.Itsuseinstructs

Page 202: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 183

cachesnottokeepacopyofdataunderanycondition.Thus,theuseoftheno-storedirectivecanbeusedtopreventtheinadvertentreleaseorretentionofsensitiveinformation.Theno-storedirectiveappliestotheentiremessage,andcanbesenteitherinaresponseorinarequest.Ifsentinarequest,acachewillnotstoreanypartofeitherthisrequestoranyresponsetoit.Ifsentinaresponse,acachewillnotstoreanypartofeitherthisresponseortherequestthatsolicitedtheresponse.Thisdirectiveappliestobothnonsharedandsharedcaches.

The purpose of the no-store directive is to satisfy the require-mentsofcertainuserswhoareconcernedabouttheaccidentalreleaseof information via unanticipated accesses to cache data structures.Althoughtheuseoftheno-storedirectivecanresultinanimprove-mentofprivacy,itshouldnotbeconsideredasareliableorsufficientmechanismforensuringprivacy.Forexample,itispossiblethatmali-ciousorcompromisedcachesmightnotrecognizeorobeytheno-storedirective,whilecommunicationsnetworksarevulnerabletoarangeofhackingmethods,includingeavesdropping.

5.1.5.5.3 The Max-Age Directive Thepurposeofthemax-agedirectiveistoenabletheclienttoindicatethatitiswillingtoacceptaresponsewhoseageislessthanorequaltothespecifiedtimeinseconds.Thatis,themax-agedirectivespecifiesthemaximumamountoftimethatdatawillbeconsideredasfresh.Unlessamax-staledirective(describednext)isalsoincluded,theclientwillnotacceptastaleresponse.

5.1.5.5.4 The Max-Stale Directive The purpose of the max-staledirectiveistoenableaclienttobewillingtoacceptaresponsethatexceeds itsexpirationtime.Ifamax-staledirective is included, theclientbecomesable toaccept a response that exceeds its expirationtimebythenumberofsecondsspecifiedinthemax-staledirective.By  using a max-stale directive without a value, the client becomeswillingtoacceptallstaleresponses.

5.1.5.5.5 The Min-Fresh Directive The purpose of the min-freshdirective is to enable a client to accept a response whose freshnessisequaltoorgreaterthanitscurrentageplusthespecifiedtimeinseconds.Thus,theuseofthisdirectiveenablesaclienttospecifythat

Page 203: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

184 aPraCtiCalguidetoContentdeliverynetworks

itwantsaresponsethatwillbefreshforatleastthenumberofsecondsspecifiedinthemin-freshdirective.

5.1.5.5.6 The No-Transform Directive Theimplementersofproxyandothertypesofcachesfounditusefultoconvertcertaintypesofdata,such as images, to reduce storage space.This functionalitymakes itpossibleforatransformationtocausepotentialproblems.Forexam-ple,intheinterestofreducingstorage,thetransformationofanX-raystoredasalosslessimageintoalossyJPEGimagecouldresultinthelossofimportantmedicalinformation.Topreventthissituationfromoccurring,theuseofano-transformdirectiveisused,sinceitspres-enceinformscachesontherequest/responsepathnottoperformanytransformationofdata.Thus,ifamessageincludestheno-transformdirective,anintermediatecacheorproxywillnotchangethoseheaderssubjecttotheno-transformdirective.Thisalsoimpliesthatthecacheorproxywillnotchangeanyaspectoftheentity-bodythatisspecifiedbytheseheaders,includingthevalueoftheentity-bodyitself.

5.1.5.5.7 The Only-if-Cached Directive Thepurposeoftheonly-if-cacheddirective is toenableaclienttocacheonlythoseresponsesit has stored and not to reload or revalidate data with an originserver. This directive is commonly employed under poor networkconnectivityconditions,anditsuseinformsaclientcachetorespondusing either a cached entry applicable to the request or with aGatewayTimeoutstatus.

5.1.5.5.8 Cache-Control Extensions The purpose of cache-controlextensionsistoenablethecache-controlheaderfieldtobeextended.Theextensionoccursthroughtheuseofoneormorecache-extensiontokens, each with an optional assigned value. There are two typesofcache-controlextensions:informationalandbehavioral.Aninfor-mationalextensiondoesnotrequireachangeincacheanddoesnotchange the operation of other directives. In contrast, behavioralextensionsoperateasmodifierstoexistingcachedirectives.Whenacache-controlextensionisspecified,applicationsthatdonotunder-standtheextensionwilldefaulttocomplyingwiththestandarddirec-tive.Incomparison,iftheextensiondirectiveissupported,thisnewdirective will modify the operation of the standard directive. This

Page 204: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 185

permitsextensionstothecache-controldirectivestobemadewithoutrequiringchangestothebaseprotocol.

Nowthatwehaveanappreciationforcache-requestdirectives,let’sturnourattentiontocache-responsedirectives.

5.1.5.6 Cache-Response Directives As indicated in Table  5.3, thereareninecache-responsedirectivesaswellasamechanismtoextendthe cache-control header. Because we previously discussed severalcache-requestdirectives thatarealsoapplicable tocache-responses,we’llprimarilyfocusourattentionuponcache-responsedirectivesthatarenotapplicableforuseonarequestpath.However,aswereviewpreviously mentioned directives, we will briefly note whether thedirectivefunctionsinasimilarmannertoitsuseinarequestheader.

5.1.5.6.1 The Public Directive Thepurposeofthepublicdirectiveisto indicate thata responsecanbecachedbyanycache.Thiscache-abilityholdsevenifdatawouldnormallybenoncacheable.Thus,thepublicdirectivecanbeusedtomakeauthenticatedresponsescacheableeventhoughsuchresponsesarenormallynoncacheable.

5.1.5.6.2 The Private Directive Thepurposeoftheprivatedirectiveistoindicatethatalloraportionofaresponsemessageisintendedforasingleuserandmustnotbecachedbyanysharedcache;however,aprivatenonsharedcachecancachetheresponse.Thus,anoriginservercanusethisdirectivetoindicatethatspecifiedportionsofaresponseareonlyapplicableforasingleuser.Notethatiftheprivate-responsedirectivespecifiesoneormorefieldnames,thespecifiedfieldnamesmustnotbestoredbyasharedcache;however,theremainderofthemessagemaybecached.Alsonotethattheusageofthewordprivateonlycontrolswheretheresponsemaybestored;itdoesnotensuretheprivacyofthemessagecontent.

5.1.5.6.3 The No-Cache Directive Theno-cachedirectivefunctionsin the samemanneras inacache-requestdirective.That is, itsuseforcescachestosubmitarequesttotheoriginserverpriortoreleasingcacheddata.NotethatmostHTTP/1.0cacheswillnotrecognizeorobeythisdirective.

Page 205: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

186 aPraCtiCalguidetoContentdeliverynetworks

5.1.5.6.4 The No-Store Directive Similar to the no-cache direc-tive, the no-store directive also functions in the same manner as inacache-requestdirective.Thatis,itsuseinstructscachesnottokeepacopyofdataincache.Thisdirectiveappliestobothnonsharedandsharedcachesandrequiresacachetostoretheinformationinnonvola-tilestorage,aswellasmakeabest-effortattempttoremovetheinfor-mationfromvolatilestorageaspromptlyaspossibleafterforwardingit.

5.1.5.6.5 The No-Transform Directive The no-transform directivealsofunctionsinthesamemanneraspreviouslydescribedwhenwecoveredcache-requestdirectives.Thatis,theuseofthisdirectivedoesnotallowcachestotransformdata,suchaschangingalosslessimageintoalossyimagetoreduceitsstoragerequirements.

5.1.5.6.6 The Must-Revalidate Directive The purpose of a must-revalidate directive is to ensure that a cache does not use an entryafteritbecomesstale.Thus,acachewillfirstrevalidateanentrypriortorespondingtoarequestwhenthemust-revalidatedirectiveispres-ent. A common use of the must-revalidate directive is to supportreliable operation for certain protocol features. Another use of themust-revalidatedirectiveoccursbyserversif,andonlyif,afailuretovalidatea requestcouldresult inan incorrectoperation, suchasanunexecutedfinancialtransaction.

5.1.5.6.7 The Proxy-Revalidate Directive The proxy-revalidatedirective is similar in functionality to the must-revalidate direc-tive.However,thisdirectivedoesnotapplytononshareduser-agentcaches. One common application for the proxy-revalidate directiveis on a response to an authenticated request. In this situation, theproxy-revalidatedirectiveenablesausercachetostoreandlaterreturntheresponsewithouthavingtorevalidate it, since itwaspreviouslyauthenticatedbytheuser.

5.1.5.6.8 The Max-Age Directive Similartoitsuseinthecache-requestheader,themax-agedirectivespecifiestheamountoftimedatawillbeconsideredfresh.Thatis,thisdirectiveindicatesthataresponseistobeconsideredstaleafteritsageisgreaterthanthespecifiednumberofseconds.

Page 206: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 187

5.1.5.6.9 The S-Maxage Directive Thes-maxagedirectiveissimilartothemax-agedirective.However,thisdirectivehasonesignificantdifferenceinthatitisonlyapplicabletosharedcaches,suchasaproxycache.Notethatthemaximumagespecifiedbythisdirectiveover-ridesthemaximumagespecifiedbyeitherthemax-agedirectiveortheExpiresheader.

5.1.5.6.10 The Cache-Extension Directive Incompletingourreviewofcache-responsedirectives,theuseofcacheextensionissimilartothatdescribed for itsuse in a cache-requestheader.That is, itsuseenablestheheadertobeextended.

5.1.6 Windows DNS Caching Problems

ThisauthorwouldberemissifhedidnotmentionacommoncachingproblemassociatedwiththeuseofWindows2000andWindowsXP.BothversionsofWindowscacheunsuccessfulDNS(DomainNameService)lookupattempts.ThismeansthatfailedattemptstocontactaWebsitearestored.Thus,youmaynotbeabletoviewaparticularWeb site when using either version of Windows until the cachedresultexpires.

Bydefault,theWindowsDNScacheexpirationtimeis5minutes.Thus, you canwait 5minutes and then retry to access aparticularURL. For those who do not want to wait until the cached resultexpires,youcanflushtheWindowsDNScache.Todoso,youwouldperformthefollowingoperations:

1.ForWindowsXP,intheWindowsTaskbar,selecttheStartmenu.Then,selectRun.

2.Inthetextbox,enteripconfig/flushdns.ThiswillresultintheDNScachebeingflushed.

3.NowreloadtheWeb-siteURL.

5.1.7 Viewing HTTP Headers

There are several methods available for viewing the full headerscontained in HTTP. You can manually connect to a Web serverusing aTelnet client activatedusingport 80.Todo so, youwould

Page 207: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

188 aPraCtiCalguidetoContentdeliverynetworks

enter the command “openwww.xyz.com:80.”Once you connect toa particular site, you can then use the GET command to requestthe representation. For example, if you want to view the headersfor http:/www.xyz.com/index.html, you would first connect towww.xyz.comonport80.Then,youwouldtype

GET/index.htmlHTTP/1.1[return] Host:www.xyz.com[return][return]

You would then press the Return key once to display each line intheheader.Unfortunately,due to security concerns,mostWeb-siteoperatorsdonotsupportTelnet into theircomputers.Thus,agoodalternativeistouseanHTTPheaderviewer.

One interesting HTTP header viewer can be used at www.websniffer.com. Figure  5.5 illustrates the HTTP header viewerscreenintowhichthisauthorenteredtheURLoftheU.S.OfficeofPersonnelManagement(www.opm.gov),whichisaU.S.governmentagency.InexaminingFigure 5.5,youwillnotethattheHTTPrequestheaderisshownfollowedbytheHTTPresponseissuedbythequeriedsite,inthisexamplewww.opm.gov.Fromtheresponseheader,wecandeterminesomeinterestinginformationaboutthequeriedsite.First,thesoftwareoperatingontheserverisidentified.Afterthedateandtimeisdisplayed,informationabouttheuseofASP.net,connection,content length, and content type isdisplayed, followedby informa-tionaboutcachecontrol.OncedataconcerningtheHTTPresponseheaderisdisplayed,Web-SnifferdisplaysthecontentoftheURLassourceHTMLcode.

Because we are concerned with cache control, we will not dis-cuss the other headers. Instead, let’s turn our attention to thecache-controldirective,whichisshownas“private.”Thismeansthatresponsesfromthissiteareintendedforasingleuserandmustnotbecachedbyanysharedcache,althoughaprivatenonsharedcachecancachetheresponse.

Another interesting site to consider for viewing HTTP headersiswww.webmaster-toolkit.com.InFigure 5.6,weenteredthesameURL,butnotethattheorderofdataintheheadersdoesnotmatchthepriorexample.Thisisprobablyduetothedesignofeachprogram,sincemostusersworkingwithheadersareconcernedaboutthesettingofcachecontrol.

Page 208: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 189

Figu

re 5

.5

Usin

g an

HTT

P he

ader

vie

wer.

Page 209: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

190 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 5

.6

View

ing

HTTP

hea

ders

for w

ww.

opm

.gov

via

web

mas

ter-t

oolk

it.co

m.

Page 210: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 191

InconcludingouruseofanHTTPheaderviewer, let’sviewtheheadersoftheNew York Times.Figure 5.7showstheheadersforwww.nytimes.comviatheuseofthewebmaster-toolkit.comsite.Notethatthisauthorscrolleddowntheresulting“headers”pagesothatreaderscouldseethattheURLnytimes.comwasenteredasthesitetosendarequestthatelicitedaresponse.InexaminingFigure 5.7,onceagainlet’sfocusourattentionuponcachecontrol.NotethattheNew York Times Web site uses a no-cache directive in its response, forcingcachestosubmitarequesttotheNew York TimesWeb-siteserverpriorto releasing cached data. As an educated guess, the reason behindtheuseoftheno-cachedirectiveisprobablyduetothefactthattheNew York Timeschangesitspagecompositionquitefrequently.Infact,therightsideof itshomepagetypically includesaMarketssectionshowingstockaveragesfortheDowJonesindex,theS&P500,andNASDAQ ,whose values areupdated every fewminuteswhen thefinancialmarketsareopen.

5.1.8 Considering Authentication

Inanelectroniccommerceenvironment,manyWebpagesarepro-tectedwithHTTPauthentication.Insuchsituations,pagesprotectedwithHTTPauthenticationareconsideredprivateandarenotkeptbyproxiesandothersharedcaches.SomeWebsitesthatwishsuchpagestobecachedcandosothroughtheuseofthepublicdirective.IfaWebsitewantsthosepagestobebothcacheableandauthenticatedforeachuser,youwouldusebothpublicandno-cachedirectives,shownasfollows:

Cache-control:public,no-cache

Thispairofcache-controldirectivesinformseachcachethatitmustsubmitclient-authenticationinformationtotheoriginserverpriortoreleasingdatafromthecache.

5.1.9 Enhancing Cacheability

Inconcludingourdiscussioncoveringcaching,let’sfocusouratten-tionuponacoreseriesoftechniquesthatcanbeemployedbyWeb-siteoperators toenhance thecacheabilityof theirpages.Table 5.4 lists

Page 211: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

192 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 5

.7

View

ing

the

HTTP

hea

ders

for w

ww.

nytim

es.c

om.

Page 212: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 193

eighttechniquesthatyoucanconsidertoimprovethecacheabilityofdata.Althoughsometechniquesmaybemoreobvious thanothers,let’sdiscusseachintheorderthattheyappearinTable 5.4.

• Minimize use of SSL/TLS:WhenSSL/TLS (secure socketslayer/transportlayersecurity)isused,pagesflowingbetweentheclientandserverareencrypted.Becauseencryptedpagesarenotstoredbysharedcaches,reducingtheuseofSSL/TLSpagesenhancestheoverallcacheabilityofWeb-sitedata.

• Use URLs consistently:UsingthesameURLcanmakeyoursitecachefriendly,enablingalargerpercentageofpagestobecached.This is especially truewhenyourorganizationpro-videsthesamecontentondifferentpagestodifferentusers.

• Use a common set of images:IfyourWebsiteusesacommonsetofimagesandyourpagesconsistentlyreferencethem,cachingcanbecomemorefrequent.

• Store images and pages that infrequently change:Ifyoucarefullyexamine your Web-site content, you will more than likelynotemanyimagesandWebpagesthatareeitherstaticorthatinfrequentlychange.Byusingacache-controlmax-agedirec-tivewitha relatively largevalue,youcanmakecaches takeadvantageofinfrequentlychangingorstaticdata.

• Make caches recognize scheduled updates:Asindicatedearlierinthissection,certainpages,suchastheNew York Timeshomepage,regularlychangewhenfinancialmarketsareopen.Youcan make caches recognize the scheduled change of suchpages by specifying an appropriate max-age or expirationtimeintheHTTPheader.

• Do not unnecessarily change files: One of the problems asso-ciated with caching is that it is relatively easy for a Web

Table 5.4 Techniques for Enhancing Cacheability of Data

Minimize use of SSL/TLSUse URLs consistentlyUse a common set of imagesStore images and pages that infrequently changeMake caches recognize scheduled updated pagesDo not unnecessarily change filesUse cookies only when necessaryMinimize the use of PoST

Page 213: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

194 aPraCtiCalguidetoContentdeliverynetworks

site to associate a large number of files with falsely youngLast-Modifieddates.Forexample,whenupdatinga sitebycopyingeveryfileinsteadofjustfilesthathavechanged,eachfilewillthenappeartoberecentlymodifiedand,asaresult,adverselyaffectcaching.Thus,Web-siteoperatorsperformingbackupsand restoresorother site-updateoperations shouldrestrictfileupdatestothosefilesthathaveactuallychanged.

• Use cookies only when necessary:Thepurposeofacookieistoidentifytheprioractionofauser.Thuscookies,whencachedby a proxy server, provide no additional benefit than whencachedbyabrowser.Thismeansthatmoreeffectivecachingcan occur if cookies are not used or their use is limited todynamicpages.

• Minimize the use of POST:WhileinformationsentinaqueryviatheuseofPOSTcanbestoredbycaches,thereverseisnotalwaystrue.Thatis,responsestoPOSTarenotmaintainedbymostcaches.Thus,minimizing theuseofPOSTmakesyourdatamorecacheable.

5.2   Load Balancing

Incomparison tocaching that canbeperformedatmany locationson the request/response path, load balancing is usually performedatthelocationwhereserversreside.Inthissectionwewillexaminetherationaleforloadbalancing,thedifferentalgorithmsthatcanbeusedtodistributeHTTPrequestsamongtwoormoreservers,andthedifferenttypesofloadbalancingavailablealongwithadiscussionoftheiradvantagesanddisadvantages.Inaddition,wewillexaminethe configuration andoperationof a relatively freemethodof loadbalancingthatoccursthroughtheuseofDNS.

5.2.1 Types of Load Balancing

ThereareseveraltypesofloadbalancingusedontheInternet.Twooftheprimarytypesofloadbalancinginvolvecommunicationsandservers. Concerning communications load balancing, the goal is toattempttodistributethecommunicationsloadevenlyamongtwoormorecomputersystems.Inacomputerenvironment,loadbalancing

Page 214: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 195

canoccuratdifferentplaces,withamainserverdistributingrequeststo multiple back-end processors representing a typical example ofcomputerloadbalancing.

5.2.2 Rationale

Web servers are similar toother computers in that theyhave a setamountofprocessingpowerandRAMmemory.BoththeprocessingpowerandRAMmemoryaswellasthespeedoftheserver’snetworkconnectionlimitthenumberofpagesperunittimethatcanbeservedtoclients.Thus,whenanorganizationhasonlyoneWebserver, itsabilitytorespondtoallincomingrequestsmaybedegradedastrafficincreases.Clientswillbegin tonoticedegradedserverperformanceasrequestedpagesloadslowlyortheyexperiencetimeoutsandfailtoconnecttotheserver.

In an attempt to alleviate the previously mentioned problem,Web-site operators have two options. First, they can attempt toupgradetheir facilitythroughtheadditionofmultipleprocessors iftheplatformusedsupportsadditionalhardware.Next,theymaybeable to increaseRAMmemory,allowingmorepages tobecached.However,anincreaseintraffictotheWebsitecanleadtothepointwhere the continued upgrade of server hardware is no longer costeffective.Atthispointintime,oneormoreserversneedtobeaddedtoenabletheloadtobesharedamongagroupofservers.Thedistribu-tionoftheloadtobesharedamongtwoormoreserversisreferredtoascomputerloadbalancing,whichisthesubjectofthissection.

5.2.3 Load Balancing Techniques

Thereare several techniques thatcanbeemployed toobtaina loadbalancingcapability.Loadbalancingcanoccurthroughsoftwareorhardwareoracombinationofhardware.Forexample,whenasoft-wareprogramisused for loadbalancing, itoperatesonahardwareplatform in front of a series of Web servers used by an organiza-tion.Thesoftwaretypicallyoperatesbylisteningontheportwhereexternalclientsconnecttoaccessaservice,suchasport80forWebtraffic. The load balancer will then forward requests to one of theback-endservers,whichwillcommonlyrespondviatheloadbalancer.

Page 215: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

196 aPraCtiCalguidetoContentdeliverynetworks

Thisactionenablestheloadbalancertoreplytoclientswithouttheirbeingawareofthefactthatthesitetheyareaccessinghasmultipleservers. In addition, the client is precluded from directly accessingtheback-endservers,whichcanenhanceasite’ssecurity.Dependinguponthetypeofloadbalancerused,thefailureofaback-endservermaybebothalarmedtonotifyasystemadministratorofthefailureaswellascompensatedforbythebalancer.Asanalternativetoacquir-inghardwareorsoftwareforloadbalanceoperations,itspossibletousetheround-robincapabilityoftheDomainNameService(DNS).Unfortunately, as we will shortly note, there is no easy method tocompensateforthefailureofaback-endserverunderDNSloadbal-ancing.Intheremainderofthissection,wewillbrieflyreviewseveraltypesofloadbalancingtechniquespriortoexaminingthemindetail.

5.2.3.1 DNS Load Balancing Perhaps the easiest technique usedto implement load balancing is obtained through the modificationof DNS entries. This technique, which is referred to as DNS loadbalancing,requiresasitetohavetwoormoreservers,eachrunningthesamesoftware.Theaddressofeachserveristhenstoredinthesite’sdomainnameserver (DNS)underonename.TheDNSserver thenissuesthoseaddressesinaround-robinmanner.Thus,thefirstclienttoaccessthesiteobtainstheaddressofthefirstWebserver,thesecondclientobtainstheaddressofthesecondserver,andsoon.Laterinthissection,wewilldiscussDNSloadbalancinginmoredetail.

5.2.3.1.1 IP Address Mapping A second load balancing techniqueinvolves mapping the name of a Web site to a single IP (Internetprotocol) address.That IP address represents the address of a com-puterornetworkappliance that interceptsHTTPrequests anddis-tributesthemamongmultipleWebservers.Thistypeofloadbalancingcanoccurthroughtheuseofbothhardwareandsoftware.Althoughthe use of a load balancing appliance is more expensive than DNSmodifications, itenablesamoreevenloadbalancingtobeachieved.This is because a loadbalancing appliance canperiodically check toseeifeachserveritsupportsisoperationaland,ifnot,adjustitsservingmechanism.Incomparison,theDNSapproachcannotcheckfortheavailabilityofserversandwouldthenperiodicallyforwardclientrequeststoaninoperativeserver,assumingaserverintheDNSentrybecame

Page 216: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 197

inoperative.Thisexplainswhy,whenyouattempttoaccessaWebsite,youmayfirstreceiveanerrormessagewhenasubsequentaccessrequestresults in the retrieval of the desired Web page. This is because thesecondrequestresultsintheDNSserverreturningtheIP addressofadifferentserveronthesecondrequest.Thus,whenmultiplerequestsarerequiredtoaccessaWebserver,thereisahighprobabilitythatthesiteusesDNSloadbalancingandatleastoneserverisoff-line.

5.2.3.1.2 Virtual IP Addressing A third load balancing techniqueinvolves the use of software to configure multiple servers with thesame IP address, a technique referred to as virtual IP addressing.Underthistechnique,multipleserverscanthenrespondtorequestsforoneIPaddress.Forexample, supposeyouhave threeWindowsservers, each assigned IP addresses 198.78.64.1, 198.78.64.2, and198.78.64.3, respectively. You coulduse virtual IP addressing soft-ware to configure all three servers to use the common virtualIP addressof198.78.64.5.Youwouldthendesignateoneserverasthescheduler,whichwouldreceiveallinboundtrafficandrouterequestsforWebcontenttotheothertwoserversbasedupontheloadbalanc-ing parameters you set. Under virtual IP addressing, the failure oftheschedulerserverwouldbecompensatedforbyassigningbackupschedulingdutiestoanotherserver.

5.2.3.2 Load Balancing Methods There are several load balancingmethodsthatcanbeusedtodistributeHTTPrequestsamongtwoormoreservers.Table 5.5liststhreeofthemorepopularmethodsofloadbalancingalgorithmsinuse.

5.2.3.2.1 Random Allocation Undertherandomallocationmethodofloadbalancing,HTTPrequestsareassignedinarandommannerto servers. Although the random method of allocation is easy toimplement,it’spossibleforoneservertoperiodicallybeassignedmorerequeststhananotherserver,althoughonaverageallserversovertimeshouldhavethesameload.

Table 5.5 Types of Load-Balancing Algorithms

Random allocationRound robinWeighted round robin

Page 217: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

198 aPraCtiCalguidetoContentdeliverynetworks

5.2.3.2.2 Round-Robin Allocation AsecondmethodthatcanbeusedtoassignHTTPrequeststomultipleserversisonarotatingorround-robinbasis.Typically, thefirst request isallocatedtoaserverselectedran-domly from a group of servers so that not all initial requests areassignedtothesameserver.Forsubsequentrequests,acircularorderisemployed.Althougharound-robinallocationmethoddividesrequestsequallyamongavailableservers,itdoesnotconsiderserverprocessingcapabilities.That is, ifone serverhas twice theprocessingcapabilityofanotherserver,around-robinallocationmethodwouldresultinthemorepowerfulserverhavinghalftheloadingofthelesspowerfulserver.

5.2.3.2.3 Weighted Round-Robin Allocation Theweightedround-robinallocationmethodwasdevelopedinresponsetothefactthattherearenormallyprocessingdifferencesbetweenservers.Undertheweightedround-robinloadallocationmethod,youcanassignaweighttoeachserver within a group. Thus, if one server is capable of processingtwice the load of a second server, the more powerful server wouldbeassignedaweightof2,while the lesspowerful serverwouldbeassigned a weight of 1. Using the weight information, the morepowerful server would be assigned two HTTP requests for eachrequest assigned to the lesspowerful server.Although theuseof aweighted round-robinallocationmethodbettermatches requests toserverprocessingcapability,itdoesnotconsidertheprocessingtimesrequired for each request. In reality, the latter would be extremelydifficult,sinceitwouldrequireeverypossiblerequesttobeexecutedoneachserverinthegroup.Then,theprocessingrequirementswouldhavetobeplacedinadatabaseontheloadbalancer.

5.2.4 Hardware versus Software

There are several differences between hardware and softwareload balances that need to be considered prior to selecting a loadbalancing platform. A hardware load balancer is usually consid-erably more expensive than its software-based cousin. Althoughthe hardware-based load balancer can provide a higher servicingcapability, it may lack some of the configuration options availablewithsoftware-basedloadbalancers.

Page 218: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 199

Today, a majority of load balances are software based. In fact,some Web server and application server software packages includeasoftwareloadbalancingmodule.Becauseoneofthemostpopularloadbalancingmethods,DNSloadbalancing,issoftwarebasedandrequiresnoadditionalhardware,wewillconcludethissectionwithadiscussionofitsoperation.

5.2.5 DNS Load Balancing

Asareview,thepurposeoftheDNSistorespondtodomainnamelookuprequeststransmittedbyclients.Inresponsetosuchrequests,the DNS returns the IP address that corresponds to the requesteddomain name. For example, assume a client wants to access ourillustrative site www.popcorn.com. Let’s assume its IP address is198.78.46.8.Then,whenthecliententerswww.popcorn.comintothebrowser’sURLfield,thebrowserfirstchecksitsinternalcachetoseeiftheURLwaspreviouslyresolvedintoanIPaddress.Assumingitwasn’t, theWebbrowser communicateswithaDNSserver for therequiredIPaddress.Inactuality,thebrowsermayaccessseveralDNSservers,suchasoneonthelocalLAN,untilitretrievestheapplicableIPaddress.ThataddresscanbecontrolledbytheDNSserverassoci-atedwiththedomainpopcorn.com,whichineffectenablesDNSloadbalancingtooccur.

UnderDNSloadsharing,severalIPaddressesareassociatedwithasinglehostname.Forourexamplewewillassumethatthedomainpopcorn.com has a very active access history of clients purchasinggoodiesandthatthedomainnowhasfourserverswhoseIPaddressesare198.76.41.1through198.76.41.4.Then,whenarequestflowstotheDNSservertoresolveadomainname,itrespondswithoneofthefourIPaddressesthatareservedupinaround-robinorload-sharingmanner.Thus,theuseofIPaddressesinaround-robinmannerresultsinaformofloadbalancing.

5.2.6 DNS Load-Sharing Methods

There are two basic methods of DNS-based load sharing you canconsider.ThosemethodsincludetheuseofCNAMESandtheuseofArecords.

Page 219: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

200 aPraCtiCalguidetoContentdeliverynetworks

5.2.6.1 Using CNAMES One of the most common implemen-tations of DNS is the Berkeley Internet Name Domain (BIND).Dependinguponwhich versionofBIND isused, you can imple-ment load balancing through the use of multiple CNAMES ormultiple Arecords.

UnderBIND4,supportisprovidedformultipleCNAMES.Let’sassume your organization has four Web servers configured withIP addresses198.78.46.1,198.78.46.2,198.78.46.3,and198.78.46.4.YouwouldthenaddtheserverstoyourDNSwithaddress(ANames)recordsasshownhere.

Web1 In A 198.78.46.1Web2 In A 198.78.46.2Web3 In A 198.78.46.3Web4 In A 198.78.46.4

NotethattheWebservernames(Web1,Web2,Web3,andWeb4)canbesettoanynameyouwant,buttheyneedtomatchcanonicalnamesaddedtoresolvewww.popcorn.cominourexampletooneoftheserversthroughthefollowingentries.

www In CnAME Web1.popcorn.comIn CnAME Web2.popcorn.comIn CnAME Web3.popcorn.comIn CnAME Web4.popcorn.com

Based upon the preceding, the DNS server will resolve thewwww.popcorn.com domain to one of the four listed servers in arotatedmanner,ineffectspreadingrequestsoverthefourserversintheservergroup.

5.2.6.2 Using A Records While the previously described methodworksunderBIND4,multipleCNAMESforonedomainisnotavalidDNSserverconfigurationforBIND8andabove.ForBIND 8name servers, you would include an explicit multiple CNAMEconfigurationoptionsuchastheoneshownhere:

Options{Multiple-cnamesyes;};

Page 220: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 201

IfyouareworkingwithBIND9andabove,thenyoushouldusemultipleArecordsasindicatedbelowtoeffectloadbalancingviaDNS.

www.popcorn.com In A 198.78.46.1www.popcorn.com In A 198.78.46.2www.popcorn.com In A 198.78.46.3www.popcorn.com In A 198.78.46.4

Asanalternativetotheabove,manytimesaTimeToLive(TTL)fieldisaddedtoArecords.TheTTLvalueindicatesthemaximumtimethatinformationshouldbeheldtobereliable.BysettingalowTTLvalue,youcanensurethattheDNScacheisrefreshedfaster,whichwillimproveloadsharingonyourorganization’sWebservers.Unfortunately,thetrade-offofthisimprovementisthefactthattheloadonyourorganization’snameserverwillincrease.Thefollowingexampleshowstheuseofa60-secondTTLvaluewhenArecordsareused.

www.popcorn.com 60 In A 198.78.46.1www.popcorn.com 60 In A 198.78.46.2www.popcorn.com 60 In A 198.78.46.3www.popcorn.com 60 In A 198.78.46.4

TheuseofCNAMESorArecordstoobtainaDNSround-robinmethodofloadbalancingistransparenttoclients.Becauseitemploysexistinghardwareandsoftware,it’sverycosteffective,requiringonlyaminimumofeffort,andispopularlyusedbysmall-andmedium-sizedorganizations.Unfortunately,aspreviouslymentioned,thereisnowayforDNStodetectthefailureofaserver,whichmeansthat,uponthefailureofaserver,DNSwillcontinuetorouteclientstothatserver.Thus,iftherearefourserversdefinedbytheuseofArecordsandonefails,then25%ofclientrequestswillbesenttoaserverthatcannotrespondtoclientrequests.

5.2.7 Managing User Requests

Oneoftheproblemsyoumayhaveinthebackofyourmindishowtoensurethataparticularserverselectedinaround-robinmethodviaDNSloadbalancingknowsthestatusofaclient.Afterall,yourWebsitewouldrapidlylosesurfersifeachclienthadtoreenterpreviously

Page 221: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

202 aPraCtiCalguidetoContentdeliverynetworks

enteredinformationeachtimetheyweredirectedtoadifferentserver.Thetechnicaltermforkeepingtrackofwhereaparticularclientiswithrespecttoitsuseofaserverisreferredtoasserver affinity.AlthoughDNSloadbalancingdoesnotdirectlyprovideserveraffinity, itcanuseoneofthreemethodstomaintainsessioncontrolanduseridentityviaHTTP,whichisastatelessprotocol.Thosemethodsincludetheuseofcookies,hiddenfields,andURLrewriting.Whiletheuseofcookies should be recognized by readers and was previously men-tionedinthisbook,theuseofhiddenfieldsandURLrewritinghasnotbeenmentionedanddeservesabitofexplanation.

5.2.7.1 Hidden Fields Ahidden field is used topass along variabledatafromoneformtoapagewithoutrequiringtheclienttoreenterdata.Ahiddenfieldissimilartoatextfield;however,insteadofbeingdisplayedonaWebpage,thehiddenfield,asitsnameimplies,doesnotshowonthepage.Inaddition,aclientcannottypeanythingintoahiddenfield,whichleadstothepurposeofthefield,whichistoprovideinformationthatisnotenteredbytheclient.Thus,aWeb servercancreateandenterdataintoahiddenfield,whichenablesasecondserverthatisaccessedviaDNSloadbalancingtomanagetheuserwithouthavingtheclientreenterpreviouslyentereddata.

5.2.7.2 Settings ThehiddenfieldunderHTMLcodeisassignedbytheuseofthetypeattributeasshownhere

<inputtype=“hidden”/>

Throughtheuseofthenameandvaluesettings,youcanaddanameandvaluetoahiddenfield.Thenamesettingaddsaninternalnametothefield,whichenablesaprogramtoidentifythefields,whilethevaluesettingidentifiesthedatathatwillbetransmittedoncetheformissubmitted.Toillustratetheuseofahiddenfield,considerthefollowingHTMLcodeshowninTable 5.6.

IntheexampleshowninTable 5.6, theactionattributespecifiesthemannerbywhichdatawillbesubmitted,whichinthisexampleoccursviaaCGIscriptontheserver.Next,themethod=postentryrepresentsapreferredmethodforsendingformdata.WhenaformissubmittedPOST,theuserdoesnotseetheformdatathatwassent.

Page 222: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

CaChingandloadBalanCing 203

Theuseoftheinputtypedefinesahiddenfieldthathasthefieldnames hidefield and value Cart_ID, which will be referenced andupdated.Assuming that therearemultiple serversusingDNS loadbalancingandthattheyare,inturn,connectedtoacommonback-enddatabasethatmaintainscartstatusinformation,anyserveraccessedbytheDNSloadbalancerwillallowthecartinformationtoberetrievedfromtheback-enddatabasewithoutfurtheractionbytheclient.

5.2.7.3 URL Rewriting Inadditiontotheuseofcookiesandhiddenfields,youcanalsoconsidertheuseofURLrewritingasamecha-nismtoprovideserveraffinity.Simplystated,URLrewritingistheprocessofinterceptinganincomingWebrequestandredirectingtherequest toadifferentresource.WhenURLrewriting isemployed,the URL being requested is first checked. Based on its value, therequestisredirectedtoadifferentURL.Asyoumightexpect,URLrewritingcanbebothcomplexandprovidethepotentialforunscru-pulouspersonstolearnmoreaboutyourWebsitethantheyneedtoknow. Thus, cookies and hidden forms are the preferred methodsusedtoobtainserveraffinity.

Table 5.6 Coding a Hidden Field in HTML

<html> <head> <title>Hidden Field Example</title> </head> <body> <form name=“clientstatus” action=“http://www.popcorn.com/clientstatus.cgi” method=“PoST”> <input type=“hidden” name=“hidefield” value=“Cart_ ID”> … </form> </body></html>

Page 223: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 224: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

205

6the cdn enterPrISe Model

The purpose of this chapter is to examine content delivery withrespecttotheenterprise.Commencingwithadiscussionofwhenandwhycontentdeliveryshouldbeaccomplishedin-house,wewillturntoseveraltechniquesthatcanbeemployedtofacilitatethedeliveryofcontenttoincludeaudio,video,andtext.Becausethemosteffec-tivemethodtodetermineanappropriatecontentdeliverymechanismrequiresknowledgeofwheredatamustflow,andthequantityofdatawithineachflow,weneedknowledgeof theenterprise’soperatingenvironment. This author would be remiss if he did not include adiscussionofoneofthebuilt-inresourcesavailableforaccessonmostWebservers: their traffic logs.Nowthatwehaveageneral appre-ciationconcerningwhereweareheaded,let’sproceedandturnourattention to investigating the Content Delivery Network (CDN)enterprisemodel.

6.1   Overview

Thesizesofbusinessenterprisescanvarysignificantly.Someenter-prisesthathostaWebsitecouldbesmallretailstores,suchasthemythicalwww.popcorn.comthatthisauthorhasemployedthrough-outthisbook.Incomparison,othertypesofenterprisescanrangeinsizefromFortune100andFortune500industrialorganizationstolargebanksandinsurancecompaniesthatwereonceconsideredtoobig to fail in competition with the so-called mom-and-pop storesthatnowaretechnicallysavvyandhaveeffectiveWebsitestomarkettheirproducts.Thus,anenterprisemodelthatmaybewellsuitedforoneorganization couldbe ill suited for adifferent-sizedorganiza-tion.Even if twoorganizations are similar in size, theymayhavedifferences in the typeand locationofcustomers that theyattract,similartothedifferencebetweenanautomobilemanufacturerandaheavy-equipmentmanufacturer.

Page 225: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

206 aPraCtiCalguidetoContentdeliverynetworks

Itwouldbeverydifficult, ifnotimpossible,todiscussaseriesofCDNenterprisemodelsthatcouldbeassociatedwithdifferenttypesof organizations. In recognition of this fact, the author has used adifferentapproachtothetopicofthischapter.InsteadofattemptingtomatchCDNenterprisemodelstoorganizations,abetterapproach,in thisauthor’sopinion,istodiscusssuchmodelsinastructuredorder.Thestructuredorderwillletreadersconsideroneormoremodelsthatcouldbewellsuitedtotheoperationoftheirorganization,providingthemwith the ability to select themost appropriatemodel.Asweexamine different CDN enterprise models, we will commence oureffortbydiscussingsimpledata-centermodelsapplicabletoorganiza-tionsthatoperateasingleWebsite.ThenwewillexaminethemorecomplexCDNenterprisemodelsthatareapplicablefororganizationscapable of operating multiple data centers. For the purpose of thischapter,wewilluse the termdata centerquite loosely todescribeafacilitywhereoneormoreWebserversarelocated.Toregressabit,this authorwas oneof thefirstWebmasterswhenhe configured adesktopcomputertofunctionasaWebserver.Thatserveroperatedinaregularenvironment,andtheroominwhichtheserverwaslocatedbecametheauthor’sdatacenter.

6.1.1 Rationale

Anydiscussionoftheuseofanin-housecontentdeliverymechanismshouldcommencewithaninvestigationoftherationalefordoingso.Afterall,formanyorganizationsthatlackthestaff,time,orequip-ment,thesimplesolutionistooutsourcecontentdelivery.However,indoingso,adistinctionneedstobemadebetweenhavingyourWebsitehostedandhavingyourWebsitehostedwithacontentdeliverycapability.Concerningtheformer,therearemanyorganizationsthathave data centers where your organization can have either a dedi-catedserveroravirtualserver,withthelatterrepresentingaWebsiteoperatingonacomputerthatrunsmanyapplicationsthatcanincludeother organizational Web servers, ftp servers, mail servers, and soon.While yourorganizationwillnothave toobtainhardware andsoftwareaswellasinstallandoperatecommunicationsfacilities,theuseofahostingfacilitydoesnotmeanthatitemploystheservicesofaCDNnorthatitoffersaCDNfacilitythatmatchesthelocations

Page 226: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 207

whereyouexpectclustersofclientstooriginatetheiraccesstoyourWebserver.Thus,youneedtoaskapplicablequestionspriortohavingyourWebsitehostedtoobtaininformationabouttheirCDNcapabil-ityoralackofthatcapability.

Table  6.1 lists five key reasons that can justify developing anin-house content delivery capability. In the following paragraphs,we willturnourattentiontoeachofthereasonslistedinTable 6.1.

6.1.1.1 Concentrated Customer Base Having a concentrated customerbasecanconsiderablyfacilitatethedevelopmentofanin-houseCDNcapability.Thisisbecauseaconcentratedcustomerbase—suchasper-sonsthatutilizeonlineticketorderingforsportseventsatanarenainMacon,Georgia,orasimilaractivitythattypicallyoriginatesavastmajorityofrequestsfromtheimmediategeographicarea—generateslocaltraffic.Whileit’spossibleapersoninLosAngelesorLondonmayalsoneedsometickets,youwillnothaveanysignificantaccessrequirementsoutsidethemiddleGeorgiageographicarea.Thus,theoperatoroftheWebsitedoesnothavetobeconcernedwithreduc-inglatencyfrompotentialWebclientsoriginatingtrafficfromareasoutsidethemiddleGeorgiaareatoincludeareasoutsidetheUnitedStates.Thus,theprimaryfocusofsuchWeb-siteoperatorsshouldbeon maintaining availability to the site. Depending upon the trafficvolume to the Web site, the site operator may consider operatingmultipleserversusinga load-balancingtechniquetoprovideahighlevelofaccesstoitsfacility.

6.1.1.2 Distributed Locations Available for Use A second reason forconsideringanin-housecontentdeliverycapabilityisinthecasewhereanorganizationhasdistributedlocationsavailableforsupportingWebservers, thereby making it possible to create an organization-wideCDN. For example, consider an organization headquartered in

Table 6.1 Rationale for Developing an In-House Content Delivery Capability

• Concentrated customer base• Distributed locations available for use• Knowledgeable staff• Control• Economics

Page 227: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

208 aPraCtiCalguidetoContentdeliverynetworks

Stockholm,Sweden,thathasbranchofficeslocatedinNewYorkCity,Singapore,Sydney,Tokyo,Paris,London,andMoscow.SupposethattheorganizationcurrentlyoperatesasingleWebserverinStockholm,andtheuseofanetworktrafficanalyzerorWeb-sitestatisticsindi-cates thatusersaccessing theserver fromJapanandAustraliawereexperiencing significant latencydelays that resulted inanumberofterminated sessions. Because the Sweden-based organization hasbranch offices in Tokyo and Sydney, it may be possible to installserversatthosetwolocationstoenhanceaccessfromAustraliaandJapan.Inaddition,byusingsuchlatencytoolsasPingandTracert,youcoulddetermineifitwaspracticaltoenhanceservicetoSingaporebyhavingclientsinthatlocationaccessaserverinTokyoorSydney.Withappropriateprogramming,trafficdirectedinitiallytotheWebserverlocatedinSwedencouldberedirectedtoaserverlocatedeitherinTokyoorSydney,basedupontheoriginoftheinitialdataflowtotheserverinSweden.Inthisexample,usingtwodistributedlocationsoutsidethemainofficewouldenablelocationsthatexperiencesignifi-cantlatencytoobtainspeedieraccesstodatabecausetheWebserverswouldbelocatedconsiderablyclosertothebrowseruser.Inaddition,eitherorbothnewserverlocationscouldbeconsideredtoserveotherpotentialclientsintheOceaniaandPacificregion.

6.1.1.3 Knowledgeable Staff Itmakesnosensetodevelopanin-housecontentdeliverynetworkingcapabilityifanorganizationlacksexpe-riencedortrainablepersonneltosupportdistributedserversandthecommunicationstheyrequire.Theavailabilityofaknowledgeablestaffortheresourcesnecessarytohireand/ortrainapplicablepersonnelisaprerequisiteforbeingabletodevelopanin-housecontentdeliverynetworkingcapability.Thus,youneedtoexamine theexistingper-sonnelskillsetandtheirabilitytotackleanewCDNproject,aswellasconsidertheavailabilityoffundingforadditionalpersonnelandthetrainingofbothcurrentand,ifrequired,additionalemployees.

6.1.1.4 Control A key advantage associated with in-house contentdelivery versus using a third party to provide a CDN capability iscontrol.By assuming responsibility for contentdelivery, anorgani-zationbecomescapableofreactingfastertochangingrequirements.Inaddition,youremployeeswilldevelopskillsetsthatathird-party

Page 228: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 209

CDNprovidermaynotachieve,sinceyourorganizationshouldknowyourcustomersbetterthanathirdparty.

Another important areaof control involves reacting to a changingenvironment.IfyouoperateyourownCDN,youcanprioritizetheworkofyouremployeestosatisfychangesthatyouoruppermanagementdeemimportant.Incomparison,athirdpartmayrequirealengthycontractmodificationtohavetheserviceproviderinitiateoneormorecontentdeliverychangestosatisfychangingorganizationalrequirements.

6.1.1.5 Economics Oneofthemostimportantreasonsfordevelopinganin-housecontentdeliverynetworkingcapabilityiseconomics.Canan organization save money by performing content delivery opera-tionsin-houseinsteadofsigningacontractthatplacestheresponsi-bilityforcontentdeliveryinthehandsofathirdparty?Theanswertothisquestioncanbequitecomplexbecausesomanyfactorsneedtobeconsideredbeyondadirectdollarandcostaccounting.Forexample,although itmaybemoreexpensive foranorganization toestablishanin-houseCDN,theabilitytotrainemployeesanddirectlycontroltheoperationofthenetworkcouldrepresentarationalefordecidingagainstapureeconomicanalysiswhereyoucomparedyourcosttothecostassociatedwiththeuseofathirdparty.

Althougheconomicsisanimportantdecisioncriterion,manytimestherecanbeotherfactorsthat,whenconsidered,resultinadecisionthatmaynotmakesensefromapureeconomicperspective.Evenifadecisiondoesmakeeconomicsense,youmayneedtoconsidertheavailabilityof funds if yourorganizationneeds toborrowmoney totakeamoreeconomicalroute.Duringwhatisnowreferredtoasthe“greatrecession,”manybanksfrozesmall-tomedium-sizeorganiza-tionsfromobtainingloansnomatterwhattheirbalancesheetslookedlike.Thus,theWeb-sitemanagermustconsiderawiderangeoffactorsthatareintheeconomicarea—butthatarebeyondapureeconomicanalysis of in-house versus third-party CDN capability—to includetheavailabilityoffundsandthetermsifsuchfundsareavailable.

6.1.2 Summary

Nowthatwehaveanappreciationforthemainfactorsthatcangovernadecisiontoperformcontentdeliveryin-house,weneedtodetermine

Page 229: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

210 aPraCtiCalguidetoContentdeliverynetworks

whether our organization requires establishing a content deliverymechanismbeyondoperatingasingleWebserver.Weneedawaytoanalyzetheexistingtrafficflowthatwillallowustorecognizeexist-ingandpotentialbottlenecksthatcouldbealleviatedbyestablishingacontentdeliverynetworkingcapability.

6.2   Traffic Analysis

There are several methods an organization can use to determinethe need for some type of content delivery networking capability.ThosemethodsincludetheanalysisofWeblogsandtheuseofothernetworkinglogsthatarebuiltintomanyserveroperatingsystems,aswellastheuseofanetworkprotocolanalyzerthathasaprogrammingcapability.YoucanevenusecookiesasamechanismtotrackhowmanytimesavisitorreturnstoyourWebsiteaswellastogenerateadditionalinformationthatenhancesyourknowledgeoftheactivitiesofclientsaccessing your server.Through theuseof a network analyzer’spro-grammingcapability,youcancreateaprogramthat,forexample,cancount trafficoriginationbycountryandthetimeofdayeachaccessoccurred, providing you with detailed information about the accesshabitsofcustomersandpotentialcustomers.Althoughsuchinforma-tionisvaluable,therearenumeroustypesofprotocolanalyzers,eachwithdifferentprogrammingconstraints.On theotherhand,Weblogsandotherprogramsbuiltintoseveralserveroperatingsystems,aswellastheuseofcookies,aremorereadilyavailablewithoutadditionalcost.Consequently,wewill turnourattentiontoserveroperating-systemprogramsandcookiesinthischapter.

6.2.1 Using Web Logs

Regardlessof the softwareused toprovideaWeb-serveroperatingcapability, each program will have at least one commonality: theabilitytorecordWebactivityintologsthatcanbeanalyzedtoprovideinformationaboutusersaccessingtheserver.Suchinformation,ataminimum,willbeabletoinformyouaboutthelocationswheretrafficoriginatedandthetimethetrafficoriginated,enablingyoutoliterallyopenawindowofobservationconcerningyourclientbase.Because

Page 230: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 211

differentWeb-serverapplicationprogramsvarywithrespecttotheircapability,wewillfocusourattentiononthetypeoflogsgeneratedbytheApacheopen-sourcesolutionforWeb-siteoperations.

6.2.1.1 Apache Access Logs ApacherepresentsapopularWeb-sitesolu-tionthatincludesaveryflexibleWeb-loggingcapability.Inadditiontosupportingerrorloggingofmessagesencounteredduringoperations,ApachehastheabilitytotrackWeb-siteactivity.Apachegeneratesthree typesofactivity logs: access, agent, and referrer. Informationrecorded into those logs tracksaccesses toyourorganization’sWebsite,thetypeofbrowsersbeingusedbyeachclienttoaccessthesite,and the referring Uniform Resource Locators (URLs) of the sitesfromwhichvisitorsarrived.Usingvariousconfigurationcodes, it ispossible to capture every piece of information about each inboundaccessrequesttoaWebsite.Table 6.2listsexamplesoftheApachestringsusedtologinformationtothelogfile,andashortdescriptionofthedataloggedasaresultofaparticularconfigurationstring.

Althoughreferredtoasanaccesslog,inrealitythislogshouldbecalledanactivitylog,asitrecordsactivityoccurringontheWebserverbeingaccessed.Initsstandardconfiguration,Apacherecordsallaccessattemptsbyclientsandallserver-sideerrors,withthelatterabletobeadjustedthroughtheuseofacontrolparameter.Inaddition,youhavetheabilitytocreatecustomlogs.Forexample,youcanarrangetologdataidentifyingthebrowsersusedbyclients,whichmightassistyourdevelopmentprogrammerstostructuretheirefforts.Becauseloggingofeverythingcanrapidlyexpandtheuseofdiskspacewhenaserverispopular,youshouldcarefullyselectthedatayoureallyneedtolog.Incomparisontotheuseofaccesslogs,errorlogscanbeusedtoiden-tifyproblemswithyoursite.Bothlogs,butespeciallytheaccesslog,canexperiencerapidgrowthwhenaserverisheavilyused.Whileyoumight be tempted to delete an access log with the expectation thatApachewillstartafreshone,it’simportanttonotethatApachekeepstrack of the access log file size and will continue to try to write atwhatitthinksshouldbethecurrentendoffile.However,thereisahelperprogram in the /bindirectory that allowsyou to “rotate” logfiles, whereby existing log files are renamed, and Apache is told tocontinuewritingatthebeginningsofthenewlogfiles.

Page 231: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

212 aPraCtiCalguidetoContentdeliverynetworks

6.2.1.2 Access Records Anaccessrecordincludessuchinformationastheclient’sIP(Internetprotocol)address;thedateandtime;arecordoftherequest,suchasaGET,POST,orPUT;theHTTPresponsecode;andthesizeoftheresponsemessage.Inaddition,it’spossibletoconfiguretheApachelogfilespecificationssothatareverselookupisperformedontheclient’sIPaddress,enablingthehostnameanddomaintobeobtainedandplacedintothelogfile.

6.2.1.3 HTTP Response Codes Mostof theHTTPstatuscodes(seeTable 6.3)willbe200s,whichindicatethesuccessfulreturnoftherequesteddata.However,thereareotherresponsecodesthatcanbeusedtoidentifypotentialhackerattacksorclienterrors.Forexample,

Table 6.2 Apache Logging Scripts

STRInG DESCRIPTIon

%a Remote IP address%A Local IP address%b Bytes sent, excluding HTTP headers, in CLF format%B Bytes sent, excluding HTTP headers%C Content of cookies in request sent to server%D Time taken to serve the request (ms)%e Content of environmental variable%f File name%h Remote host%H The request protocol%i Contents of header line(s) in the request sent to the server%I Bytes received including request and headers%J Remote log name%m The request method%n The contents of a specific note from another module%o The contents of header line(s) in the module reply%o Bytes sent, including request and headers%p The canonical port of the server serving the request%P The process ID of the child that serviced the request%q The query string, if it exists%r The first line of the request%s The status of the original request%t The time that the server finished processing a request%T The time required to service the request%U The URL path request%V The canonical server name of the server servicing the request%X The connection status when the response is complete

Page 232: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 213

aseriesof401responsesusedtoindicatean“authorizationrequired”challengefollowedbyrequestswithdifferentusernamescanbeusedto indicate that someone is probably trying to guess entries into apassword-protectedfile.Anotherinterestingcode,iscode404whichdefines“resourcenotfound.”Thiserrorcodecouldreflectanerrorbytheclient;however,theappearanceofaseriesof404responsesinthelogcouldindicatethepresenceofbadHTMLlinksinresourcesonyoursite.Inaddition,theresponsesizescouldalsoindicatepotentialproblems.Iftherecordedsizesareoftenlessthantheactualresourcesizes, this indicates that your clients are breaking connectionsbeforedownloadsarecomplete.Thiscouldindicatethateitheryourserverdoesnothavesufficientcapacitytoserviceusersorthatyour

Table 6.3 Meanings of the Most Commonly Encountered HTTP Status Codes

CoDE MESSAGE MEAnInG

200 oK Indicates a successful request resulting in a file being returned206 Partial Content Indicates that a file was only partially downloaded301 Moved

PermanentlyThe server indicated that the requested file is now located at a

new address302 Found Indicates that the user was redirected; since it’s not a Permanent

redirect, no further action needs to be taken304 not Modified Indicates that the browser made a request for a file that is already

present in its cache; a 304 status code indicates that the cached version has the same time stamp as the “live” version of the file, so they don’t need to download it; if the “live” file was newer, the response code would be 200

400 Bad Request Indicates that the server could not make sense of the request401 Unauthorized Indicates that an attempt was made to access a directory or file

that requires authentication by entering a user name and password; subsequent requests would contain a user name and password, resulting in either a 200 status code indicating a user was authenticated or a 401 status code indicating the authentication failed

403 Forbidden Indicates that the server has blocked access to a directory or file404 not Found Indicates that the requested file does not exist on the server; this

status code normally indicates a broken internal or external link408 Request Timeout Indicates that the client/server connection process was so slow

that the server decided to terminate the session410 Gone The server indicates that the requested file used to exist but has

now been permanently removed414 Request—URL

Too LongIndicates that the request was too long; this status code can

indicate that an attempt has occurred to compromise the server using a buffer-overflow technique

Page 233: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

214 aPraCtiCalguidetoContentdeliverynetworks

communicationsconnectiontotheInternetdoesnothavesufficientbandwidthtoserviceyourclientbase.

Thefollowinglogentryillustratesanexampleofasuccessfulrequestloggedintoyourserver’saccesslog.

198.78.36.8--[13/Jun/2009:17:42:18+1000]“GET/images/mercadesauto.jpgHTTP/1.1”2001076

Thepreviouslogentryinformsusthattheclient’sIPaddresswas198.78.36.8; the requestwas for a staticHTML imagefilenamedmercadesauto.jpg;theHTTP/1.1protocolwasused;and1076bytesofdatawastransmittedbytheclient.NotethattheuseofGETshouldnotnormallybeused foroperations that cause side effects, suchasusing it for taking actions in a Web application. This is because aGETmaybeusedarbitrarilybyWebrobots,whichdonotneedtoconsiderthesideeffectsofarequest.

6.2.2 Using Logging Strings

To illustrate the potential of the Apache logging strings listed inTable 6.2,let’sassumethatyourorganizationwantstologtheremotehost,thedateandtimeoftherequest,therequesttoyourWebsite,andthenumberofbytes transmitted intherequest.Toaccomplishthis,youwouldenterthefollowingcommands:

LogFormat:“%h%t%r%b”commonCustomLoglogs/access_logcommon

ThestringLogFormatstartsthelineandinformstheApachepro-gramthatyouaredefiningalogfileusingthename“common,”whichisthenassociatedwithaparticularlogformatstring.Thestringcon-sists of a series of percent characters, which generates a single logfileinsteadofindividualaccesslogs.Apacheistheninstructedtologaccessinformationinthefilelogs/access_log,usingtheformatdefinedinthepreviousline.Inadditiontobeingabletocreateasinglelogfilefor large Web sites that have multiple servers, a Web-site operatorcananalyzethedataflowtomultipleserverslocatedwithinaphysicallocationorscatteredaroundtheglobe.

For theprevious example, the sequenceof log entrieswillbeginbyincludingtheIPaddressoftheclientthatmadetherequesttothe

Page 234: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 215

server.Thisloggingresultsfromtheuseofthe%hentryinthefirstlineintheaboveexample.Next,the%tentryresultsintheloggingofthetimethattheservercompletedprocessingtherequest.Theformatofloggingisday/month/year:hour:minute:secondzone,wheredayistwodigits;month is three letters; the year is fourdigits; thehour,minute,andsecondsaretwodigits;andthezoneisfourdigitspre-fixedwithaplus(+)orminus(−)withrespect toGreenwichMeanTime.Thefollowingisanexampleoftheentryofdataintoanaccesslogfromtheuseofthe%tstring:

[14/Sep/2009:17:23:45−0600]

Thelastentryinourexample,%b,indicatesthenumberofbytesreturnedtotheclientexcludingtheHTTPheaders,incommonlogformat(CLF).

6.2.3 Web-Log Analysis

AlthoughApacheandotherWebapplicationprogramscanbeusedtocreatelogscontainingpredefinedactivity,suchactivityrepresentsrawdata.Toconverttherawdataintoameaningfulreportrequiresa reporting tool. Although some Web application programs alsoincludeareportingmodulethat,wheninvoked,willoperateagainstthelogtogenerateareport,therearetimeswhenaWeb-sitemanagerwillrequireadditionalinformation.Thisresultedinthedevelopmentofa seriesofapplicationprogramsandcollectionof scriptsused togeneratereportsfromWeblogs.AlthoughthisauthorwillleaveittoreaderstodecidewhichapplicationprogramsorcollectionofWeb-loganalysisscriptsaremostsuitablefortheiroperatingenvironment,thisauthorwouldberemissifhedidnotmentionhowreaderscanaccessapplicablesoftware.

Ifyouareoperatingtheopen-sourceApacheserver,youcangotoalmost any search engine and enter the string “apache log analysistool”orjust“apacheloganalysis.”Inearly2010,thisauthorenteredthefirststringatbing.com,Microsoft’srelativelynewsearchtool.Theresultwas2,220,000hits,as indicated inFigure 6.1.InexaminingFigure  6.1, in addition to the large number of hits on the searchterm “apache log analysis,” note that if you move your cursor overthegreater-than(>)sign,youcanobtainadditionalinformationabout

Page 235: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

216 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 6

.1

Sear

chin

g bi

ng.c

om fo

r an

Apac

he lo

g an

alys

is to

ol.

Page 236: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 217

thepagehit.Inthisexample,Webalizer isa“fast, freeWebserverloganalysisprogram,”accordingtotheadditional informationpro-vided by the search tool. Although this author will leave it to thereadertoselectanapplicablelog-analyzertool,itshouldbementionedthat somesitescontainmalicious software,whichcanbe identifiedby having an up-to-date virus checker. You can see that the sixthentrywasflaggedbytheviruscheckerusedbythisauthorashavingwhatsomepeopleconsiderasadware,spyware,orotherpotentiallyunwantedprograms.AlthoughwewillnotdiscusstheuseofApachelog-analysistools,wecandiscusssomeofthecommonreportsgen-eratedbylog-analysisprogramsandscripts.Thiswillgiveusasolidindication of the type of data we can obtain from Web logs andwhyweneedsomeadditionaltoolstodetermineifourorganizationrequiresacontentdeliverycapability.

6.2.4 Top Referring Domains

The Top Referring Domains is one of the more important reportsprovidedbymanyapplicationprogramsandWeb-loganalysisscripts.This report logs the URLs reported by browsers directing them tovariousWebpagesonyourserver.Typically,applicationprogramsandWeb-loganalysisscriptsletstheuserconfiguretheprogramorscripttolistthetop10,25,50,oranothernumberoftopreferringdomains.

Through the use of the Top Referring Domains report, youcan note which sites are providing referrals to your organization’sWeb site.Forexample,considerthetoptenreferringdomainsreportthat is shown in Table  6.4. From an examination of the entries inthereferencedtable,youcannotethattheprimaryreferringdomainsrepresentsearchengines.Althoughthethreehighestreferringsitesinourexample(google.com,bing.com,andyahoo.com)arelocatedintheUnitedStates,othersearchenginesprovidingareferringservicearelocatedintheUnitedKingdom(uk),Canada(ca),Germany(de),andFrance (fr).While amajorityof referring comes from thefirstthree search engines in theUnitedStates that are listed at the topof Table  6.4, there is also a significant amount of referring occur-ringfromsearchengineslocatedinWesternEuropeinthisexample.AlthoughthereferringfromsearchengineslocatedinWesternEuropecouldindicatethattheplacementofaserveroutsidethecontinental

Page 237: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

218 aPraCtiCalguidetoContentdeliverynetworks

UnitedStatesmightbewarranted, thereportdoesnot indicate thedelayorlatencyassociatedwiththereferrals.Thismeansthatweneedadditionalinformationpriortobeingabletoreachanintelligentandinformed decision concerning the placement of a server outside ofthecontinentalUnitedStates.Unfortunately,wecannotobtainsuchinformationdirectlyfromWeblogs.Instead,wecaneitherexaminethenumberofdroppedsessions,whichmayindicateproblemsclientshadaccessingtheWebsite,orwecanuseaprotocolanalyzerorsuchbuilt-inoperating-systemtoolsasPingandTracert.

Oneoftheusesofthetopreferringdomainsthatcanaffecttheneedforcontentdeliveryisadvertising.Forexample,considerFigure 6.2,whichillustratesthehomepageofYahooFranceduringearly2010.Note that prominently displayed is an advertisement from INGDirect.INGisanabbreviationofInternationaleNederlandenGroep,which in English is referred to as the International NetherlandsGroup.INGistheownerofINGDirect,avirtualbankwithopera-tionsinAustralia,Canada,France,Italy,Spain,theUnitedKingdom,theUnitedStates,andothercountries.

6.2.5 Considering Status Codes

Toverifywhetheryourorganizationisreceivingthevalueassociatedwithadvertisingexpenditures,youcanconsiderboththenumberofreferralsfromtheWebsiteyouareadvertisingonaswellasthepoten-tial results from adding a content delivery capability. For example,assumeyourorganizationoperatesanApacheserver.Further,assume

Table 6.4 An Example of a Top-10 Referring Domains Report

REFERRALS DoMAIn

127,183 google.com123,742 bing.com114,784 yahoo.com87,413 google.co.uk67,137 google.ca47,412 msn.com46,193 yahoo.de45,237 google.fr39,762 yahoo.fr12,756 cnn.com

Page 238: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 219

Figu

re 6

.2

The

Yaho

o Fr

ance

hom

e pa

ge d

urin

g ea

rly 2

010.

Page 239: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

220 aPraCtiCalguidetoContentdeliverynetworks

thatyouaresearchingtheresultsoftheApachelogfilesthatyoucre-atedtoincludetheuseofthe%sparameter,whichrecordsthestatuscodethattheservertransmitsbacktoeachclient.InTable 6.3,someof themorecommonlyencounteredstatuscodesresultingfromtheuseofthe%sparameterarelisted.Notethatthecodescanbesubdi-videdintosuccessfulresponsesbeginningat200,aredirectionbegin-ningat300,andclienterrorsbeginningat400.

ReturningtoourApacheapplication,assumethatananalysisofthelogrevealedthattherewerealargenumberof408statuscodes.Becausethe408statuscodeindicatesarequesttimeoutthatresultedin the server aborting the session, this would indicate that clientswere having difficulty accessing the server. While this could be abandwidthproblem,yourISPusuallyprovidesgraphsofyourband-width utilization on a daily, weekly, and monthly basis. Thus, youmightbeabletoquicklyruleoutbandwidthastheculprit.IfyouruleoutbandwidthonyourInternetaccessline,thenyouneedtochecklatencyfromthegenerallocationofclustersofclientstothelocationof your server. If the latency is causing timeouts, thenyouneed todecideifthecostofcontentdeliverybyplacingoneormoreserversinWesternEuropeisworththecost.Whilethiscouldmakeadvertisingmorefruitful,thereareotherfactorsthatneedtobeconsidered,suchassalesandmarketingofanyproductsyourorganizationsellsaswellasanimprovementinitsimagetotheclientpopulationinWesternEurope.Becauseeachorganizationisliterallyunique,thisauthorhaspresented the tools to allow readers to make the technical analysiswhileleavingtheactualdecisionprocesstothereader.

Althoughwewillshortlydiscusshowwecanuseaprotocolanalyzertoassistinourabilitytoobtaininformationforourdecision-makingprocess, let’s first complete our discussion of Web-log utilization.To doso,wewillcontinuebyexaminingsomeofthestatisticswecanobtainfromvariouslogs.

6.2.6 Web-Log Statistics

BecauseApache andotherWeb-serverprogramsenable you to logmost, if not all, of the contents of HTTP headers, it’s possible toobtainvaluablestatisticsbysummarizingthecontentsoflogs.TwoofthemoreimportantstatisticsyoucanobtainfromWeb-serverlogsare

Page 240: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 221

accesstoyourserverbaseduponthecountryofclientoriginationandthetimezonefromwhereaccessoccurred.

Adistributionofaccesstoyourorganization’sWebsitebyorigina-tioncountryprovidesyouwithagoodindicationofwheretrafficorig-inatedinitsflowtoyourWebserver.Thisinformationcanbehelpfulfordeterminingthegenerallocationofserverrequestsandtheeffectof your organization’s advertising and marketing efforts. However,sinceweareconcernedwithfacilitatingcontentdelivery,wewillleaveadvertisingandmarketingconcernstootherpublications.

6.2.7 Reverse Mapping

Priortocontinuingourexaminationoforiginationcountryinforma-tion,let’sdigressabitanddiscusshowwecanobtaintheoriginationcountryandotherdomaininformation,sincerequeststoaWebserverincludeasourceIPaddress.Thus,amappingfromanIPaddresstoadomainaddressisrequired.ThismappingisreferredtoasareverseIP lookup. Reverse IP represents an easy way to determine all the.com, .net, .org, .edu,andotherdomains fromwhichclientsaccessyourorganization’sWebserver.TherearemanyusesforareverseIPcapability that go beyond the ability to identify your client accessbase. For example, if your organization is considering the use of alarge shared host computer operated by a third party, you can usea reverse IP capability todeterminewho is using thehost prior tosigningacontract.Inaddition,ifweassumethatyourorganizationisasmallbiotechnologicalfirmandthattheserveralreadyhostsyourprimarycompetitor,youmightnotbeableusetheserverregardlessofitsclientbase.AnotherpossibleuseofareverseIPcapabilityistoidentifyphishingandscamsites,whichoftencomeingroups.IfyoulocateoneanddoareverseIPlookup,youmightfindseveralotherscamsiteshostedonthesameserver.

AreverseIPcapabilityisbuiltintomostWeb-serverlog-analysisprograms. In addition, there are numerous third-party reverse IPlookupsitesontheInternetthatcanbeaccessedviaasearchengine.When using a reverse-lookup program or program capability, youshouldcarefullyconsideritseffectupontheloggingtoyourcomputer.ThisisbecauseareverseIPlookupisatime-consumingprocessandcanslowdownabusyWebserver.Duetothisfact,manyorganizations

Page 241: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

222 aPraCtiCalguidetoContentdeliverynetworks

willcopytheaccesslogtoanothersystemtorunareverseIPlookuporwill runthereverseIP lookupontheWebserverduringa sloweveningshifttominimizeitsimpactonclientsaccessingtheserver.

ThereverseIPaddresslookupistypicallyaccomplishedthroughareverseDomainNameService(DNS)lookup,alsoknownasareverse DNS resolution or simply reverse mapping. Normally a DNS querytakestheform“whatistheIPaddressofhostwwwindomain=abc.com.”However,therearetimeswhenweneedtodeterminethenameof a host given its IP address, such as when we want to analyze aWeb-serverlogtodetermineadditionalinformationabouttheclientsaccessing the server. Through reverse mapping, we can determinethe domain name associated with a given IP address. The reverseDNSdatabaseoftheInternetisrootedintheAddressandRoutingParameterArea(ARPA)top-leveldomainoftheInternet.ForIPv4addresses,thein-addr.arpaseriesofaddressesrecordsserveaspoint-erstoapplicablerootserversforareverselookup.ForIPv6addresses,a series of ip6.arpa records function as pointers to applicable rootserversforareverseDNSlookup.Fromtheapplicablerootserver,theprocessofreverseresolvingofanIPaddressoccursbysearchingfortheapplicablenamed.revfilethatsetsupinversemapping.Thatfilewillcontainthenamesoftheprimaryandmasternameserversinanorganization’slocaldomain,pluspointerstothoseserversaswellasanyothernonauthoritativenameservers.ThenamesoftheprimaryandsecondarymasterserversareindicatedbyNSrecords,whilethepointersareindicatedbytheuseofPTRrecords.ThefilealsoneedsaStartofAuthority(SOA)recordtoshowthestartofazoneandthenameofthehostonwhichthenamed.revfileresides.Here,thetermzonerepresentsallofthehoststhatconstituteanentity.Forexample,considerthefollowingextractfromaDNSfileshowninFigure 6.3.Inthisexample,therearethreehostsinthereversedmappedzone.

AlthoughweareprimarilyconcernedaboutthereverseIPlookupprocess, a few words describing the major entries in the DNS fileextractshowninFigure 6.3arewarranted.Thefirstentryineachofthezonefiles is theStartofAuthority (SOA)resource record.TheSOArecordindicatestheauthoritativenameserverforthisdomain.BecausetheSOArecordindicatesthebeginningofazone,therecanbeonlyoneSOArecordforeachzone.

Page 242: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 223

6.2.8 SOA Record Components

TheformatoftheSOArecordis:zone IN SOA origin contact (

serialrefreshretryexpireminimum/time to live

)

ThecomponentsoftheSOArecordaredescribedasfollows:

zone: This is the name of the zone. Normally the SOA fieldcontainsanatsign(@).

IN:INisusedtostatethattheaddressclassistheInternetclass.SOA:ThetypeofresourcerecordisSOA.Alltheinformation

that follows this is part of thedatafield and is part of theSOArecord.

origin:Thisisthehostnameoftheprimarynameserverforthe domain, and it is normally written in the fully quali-fieddomainnameformatwithatrailingdotadded,suchaswww.popcorn.com.

contact:Thisisthee-mailaddressofthepersonresponsibleforthisdomain.Notethattheatsign(@)inthee-mailaddressisreplacedbyadot.

serial:Thisnumbercanbeconsidered to represent theversionnumberofthezonefile.Youneedtochangetheserialnumbereverytimeyouupdatethezonedata,asthisfieldisusedby

zone In SoA origin contact (serialrefreshretryexpireminimum )

zone In nS name server namein-addr.arpa address In PTR host1.zone.in-addr.arpa address In PTR host2.zone.in-addr.arpa address In PTR host3.zone.

Figure 6.3 Extract from a DnS file.

Page 243: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

224 aPraCtiCalguidetoContentdeliverynetworks

secondarynameservers todetermine if thezonefileon theprimaryserverhasbeenupdated.Thatis,whenthesecondaryserverrequeststheSOArecordfromtheprimary,itcomparestheserialnumberreceivedtotheserialnumber in itscache.If theserialnumberreceivedfromtheprimaryhasincreased,thesecondaryserverrequestsafullzonetransfer.Otherwise,thesecondaryserverassumesithasthemostcurrentzonedata.

refresh:Thisisthelengthoftime(inseconds)thatthesecondarynameservershouldwaitpriortocheckingwiththeprimaryservertoseeifthezonedatawasmodified.

retry: The retry time is the amount of time, in seconds, thatthe secondary name server should wait prior to attemptinganother zone refresh after a failed attempt. This numbershouldnotbesettoolow,asrapidlyretryingtoaccessadownsystemwillconsumenetworkresources.Asettingofonehour(3600)iscommonlyused.

expire:Expiredefineshowlong(inseconds)thesecondarynameservershouldkeepthedatawithoutreceivingazonerefresh.Iftherehasbeennoanswerfromtheprimaryservertorefreshrequestsafterrepeatedretriesfortheamountoftimespecifiedintheexpire,thesecondaryshoulddiscarditsdata.

minimum/time to live: This entry represents the amount oftime(inseconds)thatresourcerecordsfromthiszoneshouldbeheldinaremotehost’scache.Itisrecommendedthatthisvaluebelarge,asasmallorlowvaluewillforceremoteserversto repeatedly query for unchanged data. A commonly usedvalueis86400,whichrepresentsa24-hourtimeperiod.

To illustrate an example of the potential entries in a named.revfile, let’s assume that the domain popcorn.com consists of threehosts:aWebserver,anftpserver,andaclienthostwiththenamebaked.popcorn.com.Sincethenamed.revfilesetsupinversemapping,it needs to include the names of the primary and secondary nameserversinyourlocaldomaintoincludepointerstothoseservers.ThenamesoftheprimaryandsecondaryserverswillbeindicatedthroughtheuseofNSrecords,while theuseof anSOArecord is requiredtoindicatethebeginningofazoneaswellasthenameofthehostonwhichthenamed.revfileresides.Assumingthatit’spopcorn.com,

Page 244: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 225

whoseIPv4addressis198.78.46.1,thattheftpserveris.2onthesub-net,andthattheclientwiththenamebakedisat.3,thenFigure 6.4illustratesamoredetailednamed.revfile.

In examining the entries in the named.rev file illustrated inFigure 6.4,notethatareversenameresolutionzonerequiresthefirstthreeblocksoftheIPaddressreversedfollowedby.in-addr.arpa.ThisenablesthesingleblockofIPnumbersusedinthereversenameresolution zone file to be associated with the zone. Then, the NSrecordspoint to thename serverswhile thePTR recordshave thefollowingformat:

<last-IP-digit>INPTR<fullyqualifieddomainname-of-system>

Asanalternativetothepreviouslymentionedreverselookup,somethird-partyvendorsofferatablelookupprocessthatisinteresting.Theyconvert an IPaddress intoadecimalnumberand then search theirdatabasetoprovideavarietyofinformationtoincludethedomainandcountryoftheassociatedIPaddress.TheactualIPnumberconversionmultipliesthefirstdotteddecimalnumberby256*256*256,whichisadded to the seconddotteddecimalnumbermultipliedby256*256,whichisaddedtothethirddotteddecimalnumbermultipliedby256,which is then added to the fourth dotted decimal number. That is,assumingtheIPaddressisa.b.c.d,thentheIPnumberbecomes:

X=a*(256*256*256)+b*(256*256)+c*(256)+d

$ttl 2d ; 17280 seconds$origin 46.78.198. in-addr.arpa@ In SoA www.popcorn.com. root.www.popcorn.com. (

1.5 ; serial number3600 ; refresh 1 hour600 ; retry 10 minutes3600000 ; expire 1000 hours86400 ) ; minimum 24 hours

In nS ns1.popcorn.com.In nS ns2.popcorn.com.

1 In PTR www.popcorn.com.2 In PTR ftp.popcorn.com.3 In PTR baked.popcorn.com.

Figure 6.4 An example of a named.rev file.

Page 245: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

226 aPraCtiCalguidetoContentdeliverynetworks

Nowthatwehaveanappreciationforthereverselookupprocess,let’s continue our exploration of Web-log statistics and turn ourattentiontotheoriginationcountry.

6.2.9 Origination Country

Aswereturnourfocustotheoriginationcountryinformation,notethatsomereportssimplylistthenumberofWebpageviewsbyorigi-natingcountryindescendingorder.Whilethistypeofreportrefinesrawlogdata,youmorethanlikelywillhavetotakepenciltopaperand use a calculator to further analyze this report. For example,assume that you need to group a number of Western and EasternEuropean countries, as your organization has offices in both gen-eral locations,andyouareconsideringprovidingacontentdeliverycapabilityineacharea.ConcerningEasternEuropeancountries,let’sassumethatyourorganizationmarketsfarmingequipmentandthatyou already have branch offices in Prague, Warsaw, and Budapest.Let’salsoassumethatyourclientscurrentlyaccessyourWebserverintheUnitedStatesbutthatyourWebmanagernotedahighnum-berof server-basedtimeoutsoccurring fromEasternEurope.Thus,youmightthenconsiderestablishingaWebserverinabranchoffice;however, the question now becomes one of selecting an applicableoffice. Since we will not investigate the cost of labor, office space,andcommunications—somethingthatyouwouldordinarilyperformonyourown—wecanfocusourattentionupontheclientcountryoforigin.However, sincemostWeb-loganalysisprograms simply listtotalbycountry,youwillneedtoconsidergroupingdifferentcoun-triestogetheraswellasusingPingandTracerttodetermine,fromatechnicalperspective,wherethenewservershouldbelocated.

6.2.10 Originating Time Zone

AnothersummaryreportavailablefrommostWeb-serverlog-analysisprogramsgroupspageviewsaccordingtothetimezoneofthebrowseruser.Thistypeofreportusuallylistspageviewsindescendingorderbaseduponthetimezoneoforigination.Thattimezoneisstatedinreference to Greenwich Mean Time (GMT) with a plus or minushourlyoffset.Forexample,theEasterntimezoneintheUnitedStates,

Page 246: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 227

whichencompassesstatesfromMaineinthenorthtoFloridainthesouth,isGMT−05:00.Incomparison,Moscow,Russia,islocatedinGMT+03:00,whileTokyo,Japan,islocatedinGMT+09:00.

WhileaTopTimeZone report canprovidevaluable informa-tion, it’s possible that this report could provide some misleadinginformation.ThisisbecauseatimezonecoversanareaoftheglobebetweentheNorthandSouthPoles.Thus,alargenumberofpageviewsoccurringfromGMT–05:00,whichisthepreviouslymen-tionedEasterntimezone,couldoriginatefromNewYork,Florida,or even Argentina, because each of those locations are all in thesame time zone. This indicates that, in most cases, you need tosupplementthereportofTopTimeZonepageviewswithknowl-edge about the locations from which the traffic originated. Anexceptiontothisiswhenconsideringsystemmaintenance.Inthissituation,theuseoftheTop TimeZonepageviewscouldassistyouindeterminingtheimpactupontakingdownyourWebserverforperiodicsystemmaintenance.

6.2.11 Other Statistics

In addition to the originating country and originating time zonereports, a Web-log-reporting program or the use of a collection ofscripts can be expected to generate a series of statistics. Table  6.5provides a list of the general type of statistics you can typicallyobtainfromtheuseofaWeb-loganalysisandreportingprogramorcommercialscript.

InexaminingtheentriesinTable 6.5,youwillnotethatthesta-tisticalinformationprovidesgeneralsummaryinformationaboutthe

Table 6.5 Web-Log Analysis Statistics

MonTHLY, WEEKLY, oR DAILY STATISTICS

• Total page views• Total visitors• Unique visitors• Hourly unique visitors• Average page views per hour• Page views per visit• Busiest day in reporting period• Busiest hour in reporting period

Page 247: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

228 aPraCtiCalguidetoContentdeliverynetworks

activitiesoccurringataparticularWebsite.Althoughthisinforma-tionmaybesuitableforexaminingtheneedtoupgradeanexistingWebserver,thesummaryinformationisusuallynotsuitablefordecid-ingifanin-housecontentdeliverymechanismiswarrantedbasedonexisting traffic.Theoriginating-country report supplementedby anoriginating-time-zonereportcanbeusedtoconsidertheneedforadistributedcontentdeliverycapability.Incomparison,thesummarystatisticsareusuallybettersuitedforevaluatingthecapacityandpro-cessingpowerofanexistingWebserverorgroupofserversagainstthecurrentworkloadrequestflowingtotheserverorgroupofservers.Forexample,theaverage-page-views-per-secondstatisticcanbecom-paredwiththecapabilityofyourservertogenerateWebpages.Ifthisstatistic increasesover timeandbegins to approach the capacityofyourserver,thisindirectlyinformsyouthatit’stimetoconsiderotheroptions,suchasupgradingyourserver,installingaloadbalancerandanotherserver,orusingtheservicesofathirdparty.

Whileit’struethatnotallWebpageshavethesamecomposition—andthatthevarianceinWebpagecompositionwillaffectthepagespersecond(PPS)ratethatahardwareplatformcangenerate—work-ingwithanaveragepageviewpersecondandcomparingthatvaluetotheaverageserverpage-generationcapabilityprovidesareasonablegaugeconcerningserverperformance.Asthepage-views-per-secondvalue approaches the server’s PPS generation capability, the loadon the server increases to a point where the server must either beupgraded,aloadbalancerandadditionalservershouldbeinstalled,orathirdpartywithsufficientcapacitymustbeconsidered.

6.2.12 Other Analysis Tools

Althoughyoumaynotrealizeit,thereareavarietyofanalysistoolsyou can consider, ranging in scope from commercial and freewareproducts to the use of cookies. Concerning commercial products,one of the earliest programs developed to analyze Web-server logsin a Windows operating environment—Webtrends—is still beingmarketedinitsninthincarnation.Thiscommercialprogramprovidesuserswiththeabilitytoobtainreal-timeupdatesforkeymetricsaswellas receive real-time alerts.Marketedunder themonikerWebtrendsAnalytics 9,thisdata-collectionand-analysisprogramisbothscalable

Page 248: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 229

and customizable. In the Apache server environment, one popularlog-analysistoolisWebalizer,whichcanbedownloadedfromseveralsites.BecauseitiswritteninC,itisveryfast,anditsreportsareusuallysufficientforobtainingdataaboutmanyimportantmetrics,suchasthenumberofhitsandthepagesthatarebeingaccessed.

IfyouhaveacommercialaccountonanumberofWebsites,youmaybeabletoaccesstrafficreportsthatcanprovideyouwithinformationthatcanbeusedtoconsiderifyoushouldmaintainwhatisessentiallyathird-partyhostingserviceoruseyourownserver.Forexample,thisauthoroperatesaneBaystoresellingU.S.postagestamps.OneoftheperksofhavinganeBaystoreistheabilitytoobtaintrafficreports,suchasthereportgeneratedandshowninFigure 6.5.

IfyouexamineFigure 6.5,youwillnotethattheleftcolumnindi-catesthetypesofreportsastoreoperatorcanobtain.Incomparison,therightcolumnonthepageshowsthreegraphs:pageviews,visits,andhome-pageviewsforthecurrentmonth.Apageviewiscountedeverytimeavisitorviewsyourlistings,apagewithinyourstore,andotherpagestrackedbyeBay.AvisitisdefinedbyeBayasasequenceofconsecutivepagesviewedbyasinglevisitorfor30minuteswith-out a break, while the “storefront” displays the number of times astorefront home page was accessed. Because each report occurs onamonthly basis and compares the currentmonth to theprior fourweeksandayearago,youcanseeabargraph(currentmonth)andtwolinegraphs(prior4weeksandprior52weeks).Notethatbecausethe traffic report was executed on the 12th of the month, the bargraphterminatesatthatpoint,whilethetwolinegraphscontinuetilltheendofthemonth.Byclickingon“viewfullreport,”youcanviewthereportinitsentirety.Figure 6.6illustratesanexampleofthefullreport concerning page views. Note that this report provides someveryinterestinginformationconcerningactivityofeBayusersinter-estedinU.S.postagestamps.Inthisexample,youwillnotethatthevisitsreportindicatesgraphicallythat,upuntilthe12thofthemonth,thenumberofvisitsforthemonth,theprior4weeks,andtheprioryearwasrelativelystable inspiteof thegreatrecession.Ifyouturnyourattentiontothe lowerportionofFigure 6.6,youwill see thatunder the“details”headingyoucanviewnumericquantities.Thus,the traffic reports can provide users of eBay and other commercialsiteswithvaluabledata.

Page 249: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

230 aPraCtiCalguidetoContentdeliverynetworks

Figu

re 6

.5

An e

xam

ple

of a

traf

fic re

port

gen

erat

ed b

y E-b

ay fo

r a s

tore

.

Page 250: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 231

Figu

re 6

.6

View

ing

deta

iled

E-ba

y-pr

ovid

ed v

isit

data

.

Page 251: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

232 aPraCtiCalguidetoContentdeliverynetworks

6.2.13 Cookies

AlthoughcookieshavereceivedsomebadpublicityduetotheirusebysomeWebsites,youcanputthemtopracticaluseasamechanismtotrackWeb-serverusage.Priortodiscussinghowthiscanbeaccom-plished,let’sfirstreviewwhatacookieisandthedifferentparametersthatcanbeassociatedwiththem.

Acookieisarathersimpletextmessagethatcanbeplacedintheclientcomputer’smemoryorwrittenasafileontotheharddriveontheclientcomputer.Althoughtheprimarypurposeofacookieistostorestateinformationconcerningaclient-servertransaction,suchasitemsinashoppingcart,theuseofcookieshasconsiderablyexpandedsince its humble beginnings as a method to overcome the statelessnatureoftheHTTPprotocol.Today,youcanusecookiestodeter-mine the duration of user visits on different Web pages as well asgenerateothertypesofstatisticswithouthavingtousealog-analysisprogramortosupplementtheuseofthistypeofprogram.

If you’re using a recent version of Microsoft’s Internet Explorer,youcanviewcookiesbyselectingtheInternetOptionsfromtheToolsmenu.ThenyouwouldselecttheGeneraltabandclickonSettings,thenselectFiles,whichwillprovideaviewofalltemporaryfilesaswellascookies.YoucanalsoconfigurebothInternetExplorerandmostothermodernbrowsers toeitheracceptall cookiesor toalertyoueachtimeaserverattemptstoprovideyouwithacookie.Todoso,if you’reusingamodernversionofInternetExplorer,youwouldselecttheInternetOptionsfromtheToolsmenu.Then,byclickingonthePrivacytab,youcanmoveasliderbartothesettingyouprefer.

You can obtain an indication of the pervasive use of cookies byexamining Figure  6.7, which illustrates the use of cookies by theInternal Revenue Service (IRS) when this author was using hisFirefoxbrowser.Ofcourse,clientscanconfiguretheirWebbrowserstonotallowcookies,butfewusersdoso.InexaminingFigure 6.7,note that the leftdialogboxoccursby selecting theOptions entryfromtheToolsmenuinFirefox.Byclickingonthe“ShowCookies”bar,the leftdialogboxisdisplayed.Then,thisauthorscrolleddownto the IRS site and viewed the two cookies a visit to www.irs.govdepositedonhiscomputer.

Page 252: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 233

Ifyoudonotsetthecookie’sexpiration,thecookieiscreated,butitisnotstoredontheuser’sharddisk.Instead,thecookieismain-tainedaspartoftheuser’ssessioninformation.Whentheuserclosesthebrowser,thecookieisdiscarded.Thistypeofcookieisreferredtoasanonpersistent cookie.Anonpersistentcookielikethisisusefulforinformationthatneedstobestoredforonlyashorttimeorthatforsecurityreasonsshouldnotbewrittentodiskontheclientcomputer.Forexample,nonpersistentcookiesareusefuliftheuserisworkingonapubliccomputer,whereyoudonotwanttowritethecookietodisk.It’simportanttonotethatuserscanclearoneormorecookiesontheircomputersatanytimetheydesire.Thisistrueregardlessofthecookie’sexpirationdate

6.2.13.1 Cookie Basics Whenacookieistransmittedfromaservertoaclient’sbrowser,anadditionallineisaddedtotheHTTPheaders.Forexample,considerthefollowing:

Content-type:text/htmlSet-Cookie: roasted=2; path=/; expires Wed, 11-Aug-2010

14:42:00GMT;domain=popcorn.com

This header results in a cookie named roasted that has a valueof 2; perhaps the user visited popcorn.com and placed two roastedbagsofpopcorn inhisorhershoppingcart.Thecookiehasapathof/,whichmeansit isvalidfortheentiresite;however,thecookie

Figure 6.7 Viewing cookies on Firefox.

Page 253: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

234 aPraCtiCalguidetoContentdeliverynetworks

expireson11 August2010at2.42p.m.The domain=popcorn.com parameter tellsthebrowsertosendthecookiewhenrequestinganarbitrarypageofthedomainpopcorn.com,withanarbitrarypath.CookiesaredefinedinseveralRFCs(RequestforComments),andwhenyoudevelopcookiesontheserverside,youdefinetheirstyleascookiebasedonRFC2109orcookie2basedonRFC2965.Regardlessofthestyleused,theirnamesarecaseinsensitive.

Whenacookieistransmittedfromtheclientbrowsertotheserver,itsheader ischangedslightly.Thefollowingexample illustrates thereversetransmission:

Content-type:text/htmlCookie:roasted=2

The preceding makes the server aware of a cookie named roastedwhosevalueis2.Thus,aslongasthecookiehasn’texpired,goodoldpopcorn.comknowsthattheclient’sshoppingcarthadtworoastedbagsinit.

Fromtheaboveexamples,wecannotethatacookiehasat leastfive parameters. Those parameters include the name of the cookie,its value, its expiration date, and the path and domain for whichthecookieisvalid.Inaddition,acookiecanindicatetheneedforasecureconnectiontousethecookie.Thefirsttwoparameters(nameand value) are required, while the other four are optional and canbeseteithermanuallyorautomatically.Concerningthepath,itsetstheURLpaththecookieisvalidwithin.Pagesoutsideofthespeci-fiedpathcannotreadorusethecookie.Ifitisnotset,thedefaultistheURLpathofthedocumentcreatingthecookie.Theuseofthedomainparameterextendstheflexibilityofthepathparameter.Thatis, if a sitehasmultiple servers,youcanuse thedomainparametertomakethecookieaccessibletopagesonanyserverinthedomain.Todoso,youwoulduse thedomainparameter,assuming thatourexamplesitepopcorn.comhasmultipleservers,asfollows:

domain=www.popcorn.com

Ifthecookierequiresasecureconnectionaflagisusedtoindicatethisfact.ATRUEorFALSEvalueisusedtoindicatethatasecureconnection,suchasSSLorTSL,isrequired.

Page 254: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 235

6.2.13.2 Writing Cookies Cookiesaretransmittedtoaclient’sbrowserviatheHttpResponseobject(http://msdn.microsoft.com/en-us/library/system.web.httpresponse.aspx)thatinitiatesacollectioncalledCookies(http://msdn.microsoft.com/en-us/library/system.web.httpresponse.cookies.aspx).YoucanaccesstheHttpResponseobjectthroughtheuseofseveralprograms,suchasPerlorJavascript,orthroughtheprogram-mingcapabilityofyourserver.Thebrowserisresponsibleformanagingcookiesonausersystem,andanycookiesthatyouwanttosendtothebrowsermustbeaddedtothiscollection.

Whencreatingacookie,youspecifyasaminimumaName(http://msdn.microsoft.com/en-us/library/system.web.httpcookie.name.aspx)and Value (http://msdn.microsoft.com/en-us/library/system.web.httpcookie.value.aspx).Eachcookiemusthaveauniquenamesothatitcanbeidentifiedwhenreadingitfromthebrowser.Becausecookiesarestoredbyname,namingtwocookieswiththesamenamewillcauseonetobeoverwritten.Youcanalsosetacookie’sdateandtimeexpiration.Expiredcookiesaredeletedbythebrowserwhenauservisitsthesitethat wrote the cookies. The expiration of a cookie should be set foraslongasyourapplicationconsidersthecookievaluetobevalid.Forexample,acookiethateffectivelyneverexpirescanresultfromsettingtheexpirationdate tobe20years in the future;however,ausercanstilldeletethecookie.Acookiecanbequitecomplex,asitcanbeupto4096 charactersinlength.Thus,aWeb-serveroperatorcancreateacookiewithtimeanddateinformationaswellaspageofentrytotheserverthatcanbeusedasamechanismfortrackingclientinformation.

6.2.13.3 How a Cookie Moves Data Cookiedatacanbeacollectionofsimplename-valuepairsstoredonyourharddiskbyaWebsite.TheWebsitestoresthedata,andlateritreceivesitback.AWebsitecanonlyreceivedataintheformofcookiesthatitpreviouslystoredonyourcomputer.Itcannotexamineanyothercookie,noranythingelseonyourmachine.

WhenyoutypeaURLintoaWebbrowser,aWebservermightlookinyourcookiefile,resultinginthedatamovingasfollows:

• WhenyouentertheURLofaWebsite intoyourbrowser,your browser sends a request to the Web site for the pagedefinedbytheURL.

Page 255: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

236 aPraCtiCalguidetoContentdeliverynetworks

• YourbrowserlooksonyourcomputerforacookiefilethattheWebsitepreviouslysent.Ifitfindsacookiefile,yourbrowserwillsendallofthename-valuepairsinthefiletotheserveralongwiththeURL.Ifitfindsnocookiefile,itwillsendnocookiedata.

• TheWebserverreceivesthecookiedataandtherequestforapage.Ifname-valuepairsarereceived,theWebsitecanusethemaccordingtosomepredefinedprogram.

• Ifnoname-valuepairsarereceived,theWebsiteemployingcookies knows that you have not visited before. The servercreates a new ID for you in its database and then sendsname-valuepairstoyourmachineintheHTTPheaderforthe Web page it transmits. Your computer then stores thename-valuepairsonyourharddisk.

• TheWebservercanchangename-valuepairsoraddnewpairswheneveryouvisitthesiteandrequestapage.

6.2.13.4 How Web Sites Use Cookies Thegeneralpurposebehindtheuseofcookiesistoenableaservertostoreinformationonyourcom-puter.SuchinformationallowsaWebsiteto“remember”the“state”your browser is in, such as if you placed items in a shopping cartandwhatthoseitemsare.Asaminimum,acookieplacedonyourcomputerinformsaserverthatyoupreviouslyvisitedthesite.

Becauseof thecommonuseofcaching,proxyservers,andDNSextensions, the only method for a site to accurately count visitorsis to set a cookiewith a unique ID for each visitor.Web sites canuse cookies inmanydifferentways.Forexample,acommonuseofcookiesistodeterminethenumberofvisitorstoasite,thenumberofnewversusrepeatvisitors,andhowoftenavisitorhasvisitedthesite.ThefirsttimeavisitorarrivesataWebsite,thesitecreatesanewIDinitsdatabaseandtransmitstheIDasacookie.Thenexttimetheuserreturns,thesitecanincrementacounterassociatedwiththatIDinthedatabase,resultingintheabilityofthesitetoknowhowmanytimesthatvisitorreturnedtothesite.BysimplyrunningthroughthedatabaseandcomparingthecountervalueassociatedwitheachID,a simpleprogramcandeterminethetotalnumberofuniquevisitors.

Page 256: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 237

6.2.13.5 Problems with Cookies This author would again be remissif hedidnotmentionsomeoftheproblemsandlimitationsassociatedwithcookies.Thoseproblemsincludethefactthatmanycomputersaresharedandthatpeopleusemultiplecomputers,aswellascertainprivacyissues.

6.2.13.5.1 Shared Computers Anycomputerthatisusedinapublicarea,aswellascomputersinanofficeenvironmentorathome,canbesharedbymultiplepeople.Forexample,assumeyou’reatthepubliclibraryandyouuseacomputer tomakeapurchase fromanonlinestore.Thestorewillleaveacookieonthecomputer.IntheearlystagesofWebdevelopment,cookiessometimesincludedcreditcardinfor-mation,whichmade it possible for someone touse the same com-putertopurchasesomethingfromthestoreusingyouraccount.Eventhoughcreditcardinformationisnottypicallysentincookies,attheveryleastit’sagoodideatoclearalltemporaryfilesandcookiesafteryouuseabrowseronacomputeraccessiblebyotherpersons.

6.2.13.5.2 Multiple Computer Usage A separate issue from sharedcomputers is theuseofmultiple computers.People oftenusemorethanonecomputerduringtheday.Forexample,Ihaveacomputerinmykitchen,anothercomputerinmyoffice,andtwolaptopsthatIfrequentlyuse.Unlessthesite isspecificallyprogrammedtosolvethisproblem,IwillhavefouruniquecookiefilessentfromasitethatIvisitusingeachcomputer,oneoneachdevice.Thus,anysitethatI visitfromeachcomputerwilltrackmeasseparateusers,resultinginanextremelevelofannoyancewhenyou’reforcedtosetyourprefer-encesoverandover.Sitesthatallowregistrationandstorepreferencesatacentrallocations,suchasAmazonandeBay,makeiteasierforausertousethesameaccountondifferentcomputers.

6.2.13.5.3 Privacy Issues Although cookies can be considered asbeingbenigntextfiles,theyprovidelotsofusefulinformationaboutthehabitsofauser.WhenyouvisitaWebsite,it’spossibleforthesitetonotonlytrackthepagesyouviewandtheadsyouclickupon,but,inaddition,ifyoupurchaseanitem,theynowhaveyournameandaddress.Thenitbecomespossibleforthesiteoperatortomarketsuchinformationtoothers.

Page 257: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

238 aPraCtiCalguidetoContentdeliverynetworks

Differentsiteshavedifferentpoliciesconcerningthemarketingofinformation.Unfortunately,youmayhavetocarefullysearchthroughthecontentsofasitetodetermineitspolicy,whichiswhymanyper-sonsusevarious“spyware”removalprogramstoremovecertaintypesofcookies.Onecompany,calledDoubleClick,whichisnowownedbyGoogle,isknownforitsbanneradsonmanyWebsitesaswellasitsabilitytoplacesmallGIFfilesonsitesthatallowthecompanytoloadcookiesontoyourcomputer.DoubleClickcanthentrackyourmove-mentsacrossmultiplesites.Becauseofthispractice,DoubleClickisoftenassociatedwiththecontroversyoverspywarebecausesomeoftheirbrowsercookiesaresettotrackusersastheytravelfromsitetositeandrecordthecommercialadvertisementsaclientviewsandadstheyclickupon.Duetoitscookieoperations,DoubleClickdepositsonclientbrowsersareconsideredtobemalwarebyseveralcommer-cial organizations that detect their presence and provide the clientbrowseroperatorwiththeabilitytoremovesuchcookies.

6.2.14 Other Logging Information

Previously, we noted the use of access logs and cookies to obtaininformationabouttheuseofserverresources.Toensurethatreadershaveasolidunderstandingofserverperformance,wewillconcludeourdiscussionbyturningtoatoolbuiltintoserversoftware.BecauseMicrosoft’s Windows operating system represents the dominantoperatingsystemusedbyWebapplicationprograms,wewilllookatthePerformanceMonitortoolbuiltintoWindowsservertoobtainanappreciationforhowwecanopenawindowtoviewtheperformanceofaserver’shardwareplatform.

6.2.15 Microsoft’s Performance Monitor

MicrosoftdevelopedPerformanceMonitortoview,log,andchartthevaluesofvariousperformance-relatedcounters.PerformanceMonitorcanbeusedtospottrendsthatsignifywhetherahardwareupgradeor replacement should be considered or is necessary. PerformanceMonitor isa tool forexaminingtheabilityof theoperatingsystemand its hardware platform to satisfy operational requirements. Thename Performance Monitor was used by Microsoft to reference

Page 258: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 239

thistoolwhenitwasbundledwithWindowsNT.WhenMicrosoftintroducedtheWindows2000serverandtheMicrosoftManagementConsole(MMC),itchangedthenamefromPerformanceMonitortoPerformance.InlaterversionsofWindows,thetitleoftheproducthasrevertedbacktoPerformanceMonitor,whichwillbeusedthroughoutthefollowingdiscussion.

6.2.15.1 Activating Performance Monitor Figure 6.8illustrateshowthebuilt-inPerformanceMonitorcanbeactivatedusingaWindows2000server.AlthoughtheStartmenuhaschangedandwillundoubt-edlychangeagaininthenextreleaseofWindows,thisfigureprovidesreaders with a general indication of how to access the product. AsshowninFigure 6.8bythesequenceofhighlightedmenuentries,youwouldselectPrograms>AdministrativeTools>PerformancetoinvokethePerformanceMonitor.AlthoughthegraphicdisplaywillchangebasedupontheversionofWindowsused,PerformanceMonitorcannormallybefoundunderAdministrativeTools.

Under Windows 2000 server and other more recent Microsoftproducts,youaccessPerformanceMonitorviatheMMC,asshown

Figure 6.8 Accessing Windows 2000 server’s performance monitor provides the ability to display graphs of system performance.

Page 259: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

240 aPraCtiCalguidetoContentdeliverynetworks

inFigure 6.9.Notethattheleftportionofthedisplayprovidesatreeofactivitiesthatcanbeinvoked.Therightportionofthedisplaypro-videsagraphoftheobjectsyoupreviouslyselected.Afteryouselectoneormoreobjects,thedisplaywillshoweachobjectinadifferentcolorduringtheselectedtimeperiodandindicate,foreachgraphedentry,itslatestvalue(last)aswellasaverage,maximum,andminimumvalues.Note thatsimilar tootherwindows inMicrosoftWindows,youcaneasilyresizethedisplay.

6.2.15.2 Adding Counters and Instances IfyouexaminethelowerportionofthegraphshowninFigure 6.9,youwillnotethatonecounterwaspreviouslyselected.Thatcounterindicatesthepercentageofprocessorutilization,whichcanbeanimportantindicatorconcerningtheneedtoeitherupgradeorreplaceanexistingserverhardwareplatform.

InaMicrosoftWindowsenvironment,aserver’soperatingsystemcansupporthardwarewithmultipleprocessors.Eachprocessorisreferredto as an instance. By right-clicking on the graph, you can select anapplicablecounterorgroupofcounterstobeplotted,therebydisplay-ingadialogboxlabeled“AddCounters,”asillustratedinFigure 6.10.

Figure 6.9 Windows 2000 server’s performance monitor runs under the Microsoft Management Console.

Page 260: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 241

Notethat,throughtheuseofPerformanceMonitor,youcanselectoneormorecountersforeachinstanceorprocessor.IntheexampleshowninFigure 6.10,the“%ProcessorTime”counterwasselectedwhenthebuttonSelect Instances from the “Select counters from list” labeledboxwasselected.Becausethehardwareplatform(computer)operat-ingWindowsServerhadonlyoneprocessor,therightareaofthedis-playshowsTotal selectedwhenthebutton labeled“Select Instancesfromlist”wasclicked.Otherwise,ifthereweremultipleprocessorsonthehardwareplatformrunningWindowsserver,thecountercouldbeselectedforanindividualprocessororforallprocessors.

Asecondmethodyoucanuse toselectcounters tomonitorper-formance isby clickingon thePerformanceLogsandAlerts entryin the tree portion of the window. Three sub-branches displaywhenyouexplode thePerformanceLogsandAlertsbranch.Thosesub-branches are shown in the left portion of Figure  6.11. In thisexample,theCounterLogsentryisshownasselected,displayingtwopreviouslydefinedlogsintherightportionofthewindow.AdialogboxlabeledSystemOverviewPropertiesisdisplayedwhenyouright-clickoneitherentryorablankdisplay.ThatdialogboxisshownintherightforegroundofFigure 6.11.NotethattherearethreetabsintheSystemOverviewPropertiesdialogbox.Those tabsare labeled

Figure 6.10 System overview properties: the add counters dialog box provides a mechanism to define the counter whose values you wish to monitor.

Page 261: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

242 aPraCtiCalguidetoContentdeliverynetworks

General, Log  Files, and Schedule, with the tab labeled Generalshownpositionedintheforeground.

TheGeneraltabindicatesthenameandlocationofthecurrentlogfile.Youcanaddorremovecountersanddefinethesamplinginterval.ThetablabeledLogFilesletsyoucontrolvariousaspectsofalogfile,suchasitsformat,theamountofstoragetobeused,andtheassign-mentofacommenttothefile.Incomparison,theScheduletabletsyoudefinewhenloggingoccurs.

6.2.15.3 Working with Performance Monitor To better illustrate someofthefunctionalityandcapabilityofthePerformanceMonitor, let’smodifythepreviouslyselectedlogfile.YoucanusethebuttonslabeledAddorRemovetodoso.Ifyouwanttoaddsomecounters,clickontheAddbuttonshownintheGeneraltablocatedintheSystemOverviewPropertiesdialogbox,whichdisplaysaSelectCounterdialogbox.

TheleftportionofFigure 6.12illustratestheselectionoftheAddbutton,whiletherightportionofthedisplayshowstheSelectCountersdialogbox.Ifyoufocusonthethreebuttons in theSelectCountersdialogbox,youwillnoteagray-coloredbuttonlabeledExplain,whichwas selected by this author. Selecting that button displays a textualexplanationabouteachhighlightedcounter.Intheexampleshownin

Figure 6.11 Double clicking on a log results in the display of the system overview properties dialog box, showing the counters in the log.

Page 262: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 243

Figure 6.12,the“%UserTime”counterwashighlighted,whichdis-plays anexplanationconcerningwhat thecounter signifies.As indi-catedbytheexplanation,“%UserTime”representsthepercentageofnon-idleprocessortimespentinusermode.Thetermuser moderefer-ences applications, environment subsystems, and integral subsystemsoperatethatdonotrequiredirectaccesstohardwareorallmemory.Asthe“%UserTime”valueincreases,theloadontheprocessorincreases.

Onceyouselectthecounterstobemonitoredandselectaschedule,you can view the effect of logging via the Performance Monitor’sgraphingcapability.Toillustratethegraphingcapability,theselectedcountersshowninthe leftportionofFigure 6.12wereplottedasalinechartandasabarchart.Figure 6.13 illustrates theplotof theselectedcountersinalinechartformat,withthefivecounterslistedatthebottomofthegraphastextualdata.TheverticalbarabovetheAinAverageindicatesthecurrenttimeoftheplotastheverticallinemovesfromlefttorightacrossthedisplay.

To change the compositionof a graph, you select an icon abovethegraph.IfyoulookatthecursorarrowshowninFigure 6.13,youwillnotethatitispositionedonalinegraphicon,thefifthiconfromthe left in the seriesof icons.When this icon is selected, it resultsin the displayof a linegraphof the selectedcountervalues. If youmovethecursortothefourthiconfromtheleft,yourcursorwillbepositionedonabarcharticon.Clickingonthaticonwillresultinthe

Figure 6.12 obtaining an explanation about a highlighted counter.

Page 263: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

244 aPraCtiCalguidetoContentdeliverynetworks

typeofgraphbeingchangedfromalinecharttoabarchartformat,asshowninFigure 6.14.

InexaminingFigure 6.14,notethatonlythedisplayofthegraphchangedwhenadifferentgraphiconwasselected.Thelowerportionofthegraph,whichcontainsthecolorlegendforeachcounter,remainsasis.Thus,theuseoficonsrepresentingvarioustypesofgraphspro-videsuserswiththeabilitytoselectatypeofgraphthatsatisfiestheirrequirements.IfyoucompareFigures 6.13and6.14,inthisparticularexampleof thevaluesoffivecounters, the linegraphappearsmoremeaningful.Whileyouwouldprobablypreferthelinegraphinthissituation,inothersituationsabargraphmaybethepreferredgraphtouse.Becausechanginggraphtypesisnomorethanaclickaway,userscaneasily experimentwithdisplaying thedifferent typesofgraphssupportedbyPerformanceMonitor

6.2.15.4 Summary WindowsPerformanceMonitor andother toolscanbevaluablefordeterminingtheutilizationofhardware,processor,memory,anddiskresources.Inaddition,itsusecanassistyouinspot-tingtrendsthatwillprovideyouwithamechanismtoplanupgradesratherthanreacttoproblems.

Figure 6.13 Viewing a line graph of the values of five selected counters.

Page 264: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 245

Bycarefullymonitoring counters associatedwith theutilizationofprocessors,memory,anddiskactivity,youcanobtainavaluableinsighttopotentialutilizationproblemspriortothoseproblemsactuallyoccurring.Thisinturnwillprovideyouwiththeinformationtoconsiderseveraloptionstoimprovethecontentdeliverycapabilityofyourorganization’sWeb server or servers. These options can include adding processors,memory,andmoreorfasterdiskstoragetoexistinghardware,addingservers,orevenreplacingexistinghardwarewithmorecapableequip-ment.Regardlessoftheoptionyouselect,havingtheabilitytoviewkeyperformancemetricsprovidesyouwithadetailedinsightconcern-ingtheoperationofyourorganization’sserverhardwareplatformsandrepresentsakeytoolyoucanusetofacilitatecontentdelivery.

Nowthatwehaveanappreciationfortheuseofserveroperating-systemtools,weneedtoconcernourselveswiththeflowofdataacrosstheInternet.AlthoughWeblogsprovideagoodindicationconcern-ingthelocationsfromwheredataisarriving,theselogsdonotindicateifbrowserusersareencounteringabnormalnetworkdelaysthatwouldjustify the distribution of servers beyond anorganization’s primarydatacenter.Toobtainaninsightintothenetwork,weneedtoturntoadifferentsetoftools.Oneofthosetoolsisanetworkanalyzer;other

Figure 6.14 Viewing a bar graph of the values associated with five counters.

Page 265: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

246 aPraCtiCalguidetoContentdeliverynetworks

tools, suchas thePingandTraceroot (Tracert)programs, arebuiltintomostmodernoperatingsystems.

6.2.16 Using a Network Analyzer

Anappropriatenetworkanalyzer,alsocommonlyreferredtoasaprotocol analyzer, makes it possible to observe the flow of data between yourorganization’sserverandbrowserusers.Mostprotocolanalyzersincludeatimingchart,whichindicatestheinteractionofpacketflowbetweenyourorganization’sserverandindividualbrowserusersorgroupsofsuchclientsbytime.Byexaminingtheinteractionoftherequesterandserverby time, you can determine if there are abnormal delays that wouldwarranttheinstallationofoneormoreserversatareasaroundtheglobetofacilitateaccesstoyourorganization’sWebsite.Obviously,suchinfor-mationcanalsobeusedtodeterminetheuseofathird-partycontentdeliverynetworkingcapabilityaswellasthegenerallocationswheretheCDNnetworkshouldhaveahostingcapability.

To illustrate the use of a network analyzer, this author will useacommercialproductmarketedunderthenameObserver.NetworkInstrumentsofMinneapolis,Minnesota,developedObserver,whichsupportsbothwiredandwirelessLocalAreaNetwork(LAN)analysisatdataratesfromthe10MbpsoflegacyEthernettothe10GbpsofmoremodernGigabitEthernetnetworks.

Similar to other protocol analyzer products, you use Observerto capture and analyze certain types of data packets. For example,assumethatyourorganizationhasaLANconnectedtotheInternetandalsooperatesaWebserverthatisconnectedtotheLAN.Ifyouhave several workstations connected to the LAN, you could createseveralfilterstorecordtrafficroutedtotheserverinsteadofalltrafficrouted to your organization’s LAN. To do so, you would set up afiltertorecordtrafficthatflowstotheInternetProtocol(IP)addressof the server. If the server supports several applications inadditiontooperatingasaWebserver,youcouldalsoaddafiltertolimitthedatacapturetoinboundpacketsflowingtoport80,whichrepresentsHTTP traffic. If you wanted to capture secure traffic, you wouldthen filter data using a protocol analyzer’s AND capability to alsofilterontheSSL(SecureSocketsLayer)portnumber.ByfilteringanIP addressandoneormoreportnumbers,youcandefinethetypeof

Page 266: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 247

packetcapturetoaspecifichardwareinterfaceaswellasoneormoreapplicationsoperatingonthehardwareplatform.

Onceyoucapturerelevantpackets,youcanperformoneormoreoperationsonthecaptureddata.PerhapsoneofthemoreinterestingtoolsprovidedbytheuseofObserverisitsExpertAnalysisoption,whichdecodesthedataflowbetweentwodevices,suchasabrowserandaWebserver.

Figure 6.15 illustratesanexampleofObserver’sExpertAnalysisoption.Inthisexample,thescreenisshownsubdivided,withatimescaleinmilliseconds(ms)usedforthesubdivision.TheleftportionofthescreenshowstheflowofpacketsfromaWebbrowsertoaWebserver,whiletherightportionofthescreenshowstheresponsefromtheserver.Inthisexample,theWebbrowserwasassignedthecom-puternameMicron.ThatcomputerisshownaccessingtheMSNWebserver,whosehostaddressisentertainment.msn.comonport80.

In examining Figure  6.15, you can observe the three-way initialTransmission Control Protocol (TCP) handshake issued by thecomputernamedMicrontotheWebserver.TheinitialSYNinthe

Figure 6.15 Through the use of network Instruments observer software-based protocol analyzer, you can examine the delays associated with the flow of packets between devices.

Page 267: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

248 aPraCtiCalguidetoContentdeliverynetworks

three-way handshake occurs at the top of the time scale, with theserver’s initial response occurring slightly after 109 ms on the timescale.TheserverrespondswithaSYNACK,whichresultsinthecom-puternamedMicroncompletingthethree-wayhandshakebytrans-mitting aSYNpacket.After completing the three-wayhandshake,thecomputernamedMicrontransmitsadatapacketfurtherdownthetimescale,withtheserverrespondingatatimeslightlyafter196msonthetimescale.Forthisparticularexample,theinteractionsbetweenthe client and server are rapid, and no substantial network delayswouldwarrantanyadjustmenttothegeographicplacementofaserver.However, by using the Network Instruments Observer or a similarproductonaperiodicbasis,youcanobtainthecapabilitytoexaminethedataflowbetweendevicesandnoteanyabnormaldelaysthatcouldjustifytheredistributionofserverstofacilitatecontentdelivery.

6.2.17 Other Tools to Consider

In concluding our discussion of traffic analysis, a word about PingandTracertiswarrantedtoeithersupplementorreplacetheuseofanetworkanalyzer.Ifyouknowthedistributionofaccessoriginatorstoyourorganization’sWebsite,youcanuseeithertooltodeterminethedelayfromyournetworktoadistantdevicethatwillcloselymimicthedelaytothelocationwhereclustersofremoteusersaregeographicallylocated.Forexample,assumethat,throughananalysisofWeblogs,younotethatalargenumberofpageviewstoyourorganization’sWebserver inBostonoccurs fromRomania,Bulgaria,andseveralotherEasternEuropeancountries.WhilethegeographicdistancebetweenEasternEuropeandBostonisconsiderable,ifbrowserusersaccesstheInternetviaserviceprovidersthathaveafiber-optic linktoamajorWesternEuropeanpeeringpoint,thetransmissiondelaytotheserverlocatedinBostonmaybeminimalandnotwarrantthedistributionofcontentdeliveryoutsidetheBostonarea.

Becauseyouwouldhave toeitherdynamicallyviewWeb logsoruseanetworkanalyzerinrealtimetodeterminetheIPaddressesofremoteusers inorder torunPingorTracert,abettermethod is toselectaserverlocatedwithinthegeneralareawhereaccessrequestsareinitiatedtoyourorganization’sWebserver.Then,youcouldpingtheservertodeterminetheroundtripdelaytothedistantcomputer

Page 268: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 249

oryoucouldrunTracert toexamine thedelayoneachpath to thedistantserver.Foreithersituation,theresultingdisplaywillindicatethenetwork-associateddelaybetweenyourorganization’sWebserverandthegeneralareaofaccessfromclustersofbrowserusers.BasedontheresultsobtainedfromtheuseofPingandTracertandthevalueofbrowseruserrequeststotheserver,youcanthenmakeaninformeddecisionconcerningeitherthedistributionofserverstoremotesitestoenhancecontentdeliveryorusingathirdpartytoprovideacontentdeliveryservice.

6.3   Content Delivery Models

In concluding our discussion of content delivery performed by theenterprise, we will examine a series of CDN models. As indicatedearlierinthischapter,somemodelsmaybebettersuitedtooneorga-nization than another. However, since the number of offices, theactivity of an organization’s Web site, and the location of browserusersaccessingtheWebsitecanvaryconsiderablybetweenorgani-zations, our focus will be on Enterprise CDN models, leaving theselectionofanappropriatemodeltothereadersofthisbook.Inthissection,wewill commenceour investigationofCDNmodelswithan elementary single-site, single-servermodel.Using thismodel asabase,wewillproceedtodiscussmorecomplexmodels,examiningsingle-site,multiple-servers;multiple-sites,single-serverpersite;andmultiple-site, multiple-server per site models. As we discuss eachmodel,wewillexamineitsadvantagesanddisadvantagesaswellasdiscussthemodel’soperation.

6.3.1 Single-Site, Single-Server Model

The simplest model of content delivery over the Internet is thesingle-site,single-servermodel.ThismodelresultsinanorganizationconfiguringasinglehardwareplatformatasinglelocationtoprovideWebservicestobothcustomersandpotentialcustomersaroundtheglobe,regardlessofthelocationofbrowserclients.

6.3.1.1 Advantages Thekeyadvantageassociatedwiththesingle-site,single-server model is its cost and efficiency. A single hardware

Page 269: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

250 aPraCtiCalguidetoContentdeliverynetworks

platformreducesbothhardwareandsoftwarecosts,includinglicens-ingfeesforcertainproducts.Withrespecttohardwarecost,notonlydoyoureducethecostassociatedwithrequiringadditionalservers,butinaddition,youeliminatethecostassociatedwithaloadbalancerorsimilarhardwareproduct.Concerningsoftwarecosts,becausethereisonlyoneserver,theorganizationdoesnothavetopurchasemultipleoperatingsystemsorapplicationprograms,noraremultiplelicensesforcertainproductsnecessary,reducingsoftwarecoststoaminimum.In addition, the single server requires less support than multipleserversdo.Forexample,runningasimplelog-analysisprogramonasingleservermighttakeafewhourseachweek.Incomparison,whenyouhavemultipleserversatasite,youmayhavetofirstrotateandmovelogsandthenrunaprogramagainstaseriesoflogs.Thetimeinvolvedinrotatingandmovinglogscouldexceedthetimetorunaprogramagainstasinglelog,whiletheapplicationprogramyouuseagainstmultiplelogsmayrequireanadditionallicensefee.

6.3.1.2 Disadvantages There are several disadvantages associatedwiththesingle-site, single-servercontentdeliverymodel.Themostobvious disadvantage is the failure of the server, which results inthe removal of the presence of an organization from the Internet.Anotherproblem is theoccasionalneed forhardwareand softwaremodifications, which could temporarily render the site inoperative.Last,butnot least,asingle-siterequiresbrowseruserswhomaybelocatedaroundtheglobetohavetheirrequestsflowthroughoneormorecommontelecommunicationscircuitsandpeeringpointstothesite.Ifoneormoreofthosecircuitsorpeeringpointsshouldbecomeinoperative,alargenumberofpotentialuserscouldhavetheiraccesstotheWebsiteblocked.IfacircuitfailureoccurswithinthenetworkoftheISPusedbytheorganizationhostingtheWebsite,it’spossiblethatallaccesstothesitecouldbeblockedeventhoughtheserverisoperational.Thus,thesingle-site,single-servermodelresultsinsomekeypotentialoperationaldeficiencies.

Another problem associated with the single-site, single-servermodelisthefactthatallbrowseruseraccessflowstoacommonserverlocation.Thismeansthatbrowseraccessfromonelocationcouldberelatively rapid, with minimal delay encountered by the user client.In comparison, browser access from other locations could traverse

Page 270: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 251

multiplepeeringpointsaswellashavethedataflowroutedoverrela-tivelylow-speedcommunicationscircuitsorhigh-speedcircuitswithlotsoftrafficandveryhighoccupancylevels.Concerningthelatter,routersaredesignedtodroppacketswhennecessary,whichmeansthatretransmissionrequestswilloccur,whichcanresultinlatencyincreas-ing to the point where browser users either access your competitorWebsitesorsimplyabandonyourorganization’sWebsite.Foreithersituation,thisrepresentsapotentiallossofrevenueandgoodwill.

6.3.1.3 Considering Server Options Althoughmanytimesinthisbookwehavenotedthatasingleserver representsapointof failure thatcanmakeyourorganization’sWebsiteinoperative,afewwordsareinorderconcerninghowtoincreasethelevelofserveravailabilityandwhy,afterdoingso,asingleservercanbemorereliablethantheuseofmultipleservers.Thekeytoraisingtheavailabilitylevelofaserverisredundancyandcutovercapability.Themoreredundancyaserverhas,thegreateritsreliabilitycanbecome,assumingacutovermechanismexiststoreplaceafaileddevicebyanoperationalone.Forexample,manyserverscanbeobtainedwithdualpowersupplieswhere,intheeventof the failureof one, the second takesover.Similarly, in thearea of disk drives, the use of different RAID (Redundant ArrayofIndependentDisks)levelscanbeusedtoaddredundancytodiskstorage.While it isuptothereader todeterminetheneedforandlevel of redundancy, by adding applicable levels of redundancy youmaybe able to enhance the availability level of your organization’sWebservertothepointwhereitprovidesasimilaravailabilityleveltothatobtainedfrommultipleservers.

6.3.1.4 Considering Network Operations Ifyourorganizationpurchasesredundancy for your Web server or servers, you probably analyzedtheneedtohaveahighlevelofavailability.Unfortunately,thehighlevel of availability can be compromised by using a single Internetaccessline,evenifthatlineoperatesasaT3facilityatapproximately45 Mbps.ToobtainahighlevelofInternetavailability,youneedtoinstall multiple access lines to the Internet. In addition, such linesshould be routed to your organization’s location through differ-ent InternetServiceProvider (ISP)pointsofpresence (POPs)and,if possible,bydifferentISPs.Thisauthorcantestifytothevalueof

Page 271: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

252 aPraCtiCalguidetoContentdeliverynetworks

thelatterwhen,duringaratherlargerainstorm,abridgeoverwhichaSONETopticalfiberwasroutedeastwardfromMacon,Georgia,toAtlantawashedaway.WhileoneconnectiontotheInternetwasterminateduntil thefiberwasrestrungweeks later,asecondaccesslinetoJacksonville,Florida,thatwentweststayedoperational.

6.3.2 Single-Site, Multiple-Server Model

AsecondCDNmodelretainsthesingle-sitelocationbutaddsoneormoreclustersofserverstoasingle-site,single-servermodel.Referredto as a single-site, multiple-server model, the use of multiple serverscanusuallyprovidemoreprocessingcapability thana single server.Inaddition,dependingonthewaymultipleserversareconnectedtothe Internet, this model could provide multiple connections to theInternet,increasingtheavailabilityofaccesstothecontentpresentedbytheorganization.

Theabilitytoeffectivelyusemultipleserversatasinglesiteisbaseduponhavingthecapacitytobalancetheloadonthemultipleserverslocated at the site. To do so, this requires either theprogrammingofDNSortheuseofa loadbalancer.Whileeithermethodcanbeusedtomakemoreeffectiveuseofyourorganization’sWebservers,aspreviouslydescribedinChapter5,therearedisadvantagesassociatedwitheachmethod,whichwewillshortlydiscuss.

6.3.2.1 Advantages Using two or more servers enables the failureof one server to be compensated for by the remaining operationaldevices.Inaddition,theeffectofhardwareandsoftwareupgradescanoccurononeserveratatime,resultingintheotherserversintheclus-terhavingtheabilitytoservecustomerswithouthavingtobeplacedofflineorhavingtheirperformancedegraded.IfmultipleconnectionstotheInternetareused,theavailabilityofservercontentisincreased.

6.3.2.2 Disadvantages Using a single site maintains the disadvan-tagesassociatedwithbrowseruserslocatedaroundtheglobeaccess-ingacommonlocation.Datatraversingacrossmanypeeringpoints,ortheroutingofdataoverrelativelylow-speeddatacircuitsorheavilyusedhigh-speedcircuits,canresultinsignificantdelays.Suchdelayscanresultinbrowserusersabandoningyourorganization’sWebsite

Page 272: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 253

orevenaccessingacompetitor,eitheractionresultinginthepoten-tial lossofrevenuetoelectroniccommerceWebsites.Forexample,consider a user accessing the Web site of an airline, attempting topurchase two round-trip tickets from New York to Los Angeles.Thisisnotahighlyunusualroute,andtherearemanyairlinesaswellas travel-orientedWeb sites that aremore thanhappy toprovidearesponsetoabrowseruser’srequest.Thus,ifbrowserusersencounterdifficulties inaccessing theWebsiteofoneairline, there isagoodpossibilitythattheywillmoveonandaccesstheWebsiteofanother,resultinginaconsiderablelossofrevenuetothefirstsite.WhileeveryWebsiteisdifferent,youneedtoassessthepotentiallossassociatedwithaninoperableorinaccessibleWebsiteandactaccordingly.Forexample, you can usually compute a retail organization’s Web-sitesaleshistoryonanhourlybasis.Youmayalsobeabletodeterminethenumberofoutagesandthetotaloutageoveraperiodoftime.Then,youcancomputethecostofeachoutageandusethisdataasamecha-nismtodeterminethevalueofaddingcertaintypesofredundancytoyourorganization’sWebsite.

Although the old adage that two is better than one applies tomultiple servers, you need to determine a method for distributingtraffic among multiple servers. This means that your organizationwillrequiresometypeofload-balancingmechanismtodistributethetrafficloadinanequitablemanneramongtheserversinyourorga-nization’s server cluster.Althoughyou canuse theDomainNameSystem(DNS)load-balancingcapabilitywithoutanyadditionalcost,aspreviouslynotedinChapter5,aDNSserverhasnocapabilitytocheckthestatusofotherservers initstables.Forexample,usingaprogrammedDNStobalanceloadswillworkratherwellaslongasallserversareavailable.Ifoneoutoffivefails,then20%ofservicerequestswouldbeforwardedbytheDNSservertoafailedmachine,andtheclientwouldreceiveanerrormessageafteratimeoutperiodoccurs. If theclient tries again to reacha server,heor shehasan80  percent chance of being successful; however, the client mightsimply decide not to try again and abandon the access attempt.In theeventthataloadbalancerisusedandthehardwareplatformfails,thiswouldcausethesitetobecomeunreachable,whichiswhysuch networkappliances typically canbeobtainedwith redundantmemoryandpowersupplies.

Page 273: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

254 aPraCtiCalguidetoContentdeliverynetworks

6.3.3 Multiple-Sites, Single-Server per Site Model

Toalleviatepotentialdelaysassociatedwithbrowserusersaccessingacentral site,yourorganizationcanconsiderplacingserversatoneor more office locations within a country, or scattered in differentcountries,baseduponananalysisofclientlocations.ThisactionwillresultinanewCDNenterprisemodelinvolvingmultiplesites.Whenasingleserverislocatedateachsite,weobtainamultiple-site,singleserverpersiteenterprisemodel.

Theprimaryreasonforamultiple-sitemodelwillbebasedontheneedtodistributecontentclosertoexistingandpotentialcustomers.Assumingthatyouuseoneormoreofthetoolspreviouslymentionedinthischaptertodeterminethatamultiple-sitemodelismoreappro-priatethanasingle-sitemodel,youneedtoconsiderthenumberoflocationswhereserversshouldbeinstalledandthenumberofserverstobe installed at each location.Themultiple-site, single-serverpersitemodelrepresentsthesimplesttypeofgeographicallydistributedcontentdistribution.

Under the multiple-site, single-server per site model, an organi-zation examines both actual andpotential data flow to its primaryWebsite.Byobservingwherecustomersandpotentialcustomersaregeographicallyclustered,andnotingdelaysassociatedwithInternetaccessanddataflowfromthoseclusteredareas,anorganizationcandeterminewhereoneormoreserversshouldbe installedoutsideofthe main data center. Thus, the multiple-site, single-server per sitemodelrepresentsawaytodistributeserverstogeographic locationswheretheirinstallationcanenhancebrowseruseraccess.

6.3.3.1 Advantages The primary advantage associated with themultiple-site, single-server per site model is the fact that it enablescontenttobeplacedclosertobrowserusers.Thiscanreducelatency,whichshouldresultinadeclineinsiteabandonment,whichinturncanresultinanincreaseincustomersandcustomerrevenue.Asecondadvantageassociatedwiththismodelisthefactthatserversarenowplacedattwoormoredistinctlocations.Thus,thefailureofacommu-nicationscircuit,anelectricaloutage,oranothertypeofimpairmentmaynotcauseanorganization’sentireWebpresencetodisappear.

Page 274: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 255

6.3.3.2 Disadvantages Thecomparisonofanymultiple-sitemodeltoasingle-sitemodelwillresultindifferencesinhardware,software,andsupportcosts.Unlessthesingle-sitemodelemploysalargenumberofclusteredservers,thecostofamultiple-sitemodelcanexceedthecostofasingle-sitemodel.

Anotherdisadvantageassociatedwiththemultiple-sitemodelisthewaybrowserusersaredirectedtooneofthemultipleserverslocatedatgeographicallydispersedareas.Ifthedirectionoccursthroughthecentralsite,thatlocationbecomesaweaklink.IfthatsiteexperiencesapoweroutageorifacircuitlinkingthatsitetotheInternetfails,thenitspossiblethattheredirectionfacilitywillfail.Youcanovercomethisproblemby settingup separatedomains for eachdistributed serversothatthedistantuserrequestsflowdirectlytoanapplicableserver.Thekeyproblemassociatedwith thismethodoccurswhencontentupdatesaremanagedfromacentralsite.IfthatcentralsiteshouldloseitsInternetconnection,therewillbeadelayinupdatingdistributedservers.Inaddition,iftheorganizationmaintainsitsdatabaseserversatthecentralsite,whichresultsinthedistributedservershavingtoaccessthecentralsite,afailureatthat locationwilladverselyaffecttheabilityofbrowserusersaroundtheglobetoaccessinformationorpurchaseproductsthatrelyuponcheckingthedatabaseatthecentralsite.Underthisscenario,thesolutiontotheproblemistoincreasethelevelofavailabilityofaccesstothecentralsite.

Thereare twogoodways todo this.First, thecentral site couldinstall redundant communications links to the Internet so that thefailureofonecircuitwouldbecompensatedforbytheabilitytouseasecondcircuit.Aspreviouslydiscussed,toprovideanevenhigherlevelofavailability,thecentralsitecouldusethecommunicationsfacilitiesof two different ISPs. A network problem experienced by one ISPwould then be compensated for by the ability of data to reach thecentralsiteviathenetworkofthesecondISP.However,ifyoudecidetoemploytwodifferentISPs,itisimportanttoensurethattheypro-vide Internet accessover twodiversepaths.Earlier in this chapter,thisauthormentionedafloodthatwashedawayabridgethatincludedahighspeedfiber-opticcircuitroutedfromMacontoAtlanta.IfbothISPsaggregatedtheirserviceinMaconandrouteddatatoAtlantavia

Page 275: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

256 aPraCtiCalguidetoContentdeliverynetworks

thewashed-awayfiber,redundancywouldfail.Thus,it isextremelyimportanttoconsidertheroutesviadiversecircuitsflow.

Asecondmethodthatcanbeusedtoraisethelevelofavailabilityofthecentralsiteisobtainedbyusingtwoormoreserversatthatloca-tion.TheCDNmodelisthenmodifiedintoanewcategorythatwewillrefertoasthemultiple-site, multiple-server model.

6.3.4 Multiple-Site, Multiple-Server per Site Model

Asitsnameimplies,themultiple-site,multiple-serverpersitemodelresultsinanorganizationplacingtwoormoreserversinatleastonelocationinadditiontoitscentralsite.Thegoalbehindthismodelistoprovidehighlyreliableaccesstocontentdistributedclosertotheuser.Althoughthismodelisbothcostlyandcomplextoimplement,itisusedbymanymultinationalorganizationsthathaveestablishede-commercebusinessesontwoormorecontinentsorhavealargenumberofuserslocated ingeographicallydispersed locations. Inaddition, therearevarious versions of themultiple-site, multiple-server per sitemodelthatwarrantdiscussion.Whilethebasicmultiple-site,multiple-serverpersitemodelmayimplythatinitialdataflowisdirectedtoacentralsite,it’spossibletosetupindependentdomains.Forexample,browserusers located in Europe would access a data center located on thatcontinent,whilebrowseruserslocatedonanothercontinentwouldbedirectedtoadatacenterlocatedonthatcontinent.Ateachdatacenterhostingmultipleservers,aload-balancingmechanismwoulddistrib-utetheworkloadoverthose localmultipleservers.Inthisexample,eachlocationfunctionsasanautonomousentity,andthereisnoneedforcommunicationsbetweenserverlocations.

Theopposite endof theautonomousentityoperatingmodel is anonautonomousentitymodel.Inthissituation,eachlocationoutsidetheprimarydatacentercommunicateswiththedatacentertoobtainthedynamiccontentorothertypesofinformation,suchasdailypricechanges for different products. Between the autonomous and non-autonomous entities are partial autonomous entities, where one ormoredistributedsitesonlyhavearequirementtoperiodicallyaccessacentralsite.Nowthatwehaveanappreciationforthepotentialvari-ancesassociatedwiththemultiple-site,multiple-serverpersitemodel,

Page 276: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

theCdnenterPrisemodel 257

let’sconcludeourdiscussionconcerningthismodelbyturningtotheadvantagesanddisadvantagesassociatedwiththismodel.

6.3.4.1 Advantages There are several advantages associated with amultiple-site,multiple-serverpersitemodel.Theprimaryadvantageisreliabilityandavailabilityofaccess.Usingmultipleserverspersitereducestheprobabilityofasitebeingdown.Inaddition,becausetherearemultiplelocationswithmultipleservers, it ishighlyunlikelyforthepresenceoftheorganizationontheInternettogodark.Becauseaccesstoeachlocationcanbefurtherenhancedbymultipleconnec-tionstotheInternet,itbecomespossibletocreateaveryreliableandhighlyavailablepresenceontheWebwhilemovingcontentclosertotheultimaterequester.

6.3.4.2 Disadvantages Similartothemultiple-site,single-serverpersitemodel,thekeydisadvantageofthemultiple-site,multiple-serverpersitemodelarecostsandcomplexity.Becausemultipleserverswillbeinstalledatmultiplelocations,costscaneasilybecomeamajorissue.In addition, because a load-balancing mechanism will be requiredtodistributetheloadateachlocation,thecomplexityofthismodelis far above the levelof complexityof theothermodelsmentionedin this chapter. When you add development and operational costs,this model is both the most expensive to implement and the mostexpensivetomaintain.However,forsomee-commercemultinationalorganizations, thebenefitsassociatedwithhavingadirectpresenceatmanylocationswhereWebcontentcanbetailoredtotheareafaroutweighsthecostoftheeffort.

6.3.5 An In-Between Model

Onemechanismyoushouldconsiderforyourorganizationiswhatthisauthorreferstoasanin-betweenmodel.Becausenotwoorganizationsare thesame,chancesarehighthatyourorganizationneedssome-thingthatdoesn’tquitefullyfitintoasinglemodel.Thus,youmightendupwithsomemultiplesites,somewithsingleservers,somewithredundantserverswithaload-balancingmechanism,andothersiteswithasingleserverthathasahighdegreeofbuilt-inredundancy.

Page 277: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

258 aPraCtiCalguidetoContentdeliverynetworks

In concluding this chapter, an observation by this author iswarranted. Ifyou travel andaccess the Internet fromseveral coun-tries,youwillprobablynotethatyoucanaccesstheWebsitesofmajorbooksellers,homeimprovementstores,aswellasmajorapplianceandelectronicstypestoresthatarenativetothecountryyouaretravelingin,butwhicharebranchesofmultinationalfirmsheadquartered inanothercountry.WhenyouaccessthelocalWebsite,youareaccess-ing either a multiple-site, single-server per site, or a multiple-site,multiple-serverpersiteWebmodel.

Page 278: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

259

7weB-hoStIng oPtIonS

Until now, we have subdivided the ability to obtain a presence ontheInternetintotwobasiccategories:Doityourself,oremploytheservicesofacontentdeliveryprovider.Althoughthecontentdeliveryprovider can be viewed as a form of Web-hosting organization, inactuality the termWeb hosting candenotea rangeoforganizations,includingcontentdeliveryproviders.Theuseofathirdpartyforhost-ing an organization’s presence on the Internet represents a contentdeliveryoption,andwenowfocusonthistopic.

In this chapter, we will turn our attention to third-party Webhosting.Wewillinitiallydiscusstherationaleforusingathird-partyWeb-hostingfacility.Onceweunderstandthemajorreasonsforcon-sideringthismethodofestablishingapresenceontheInternet,wewilldiscussthedifferentcategoriesofWebhostingavailableforconsider-ation,thevarietyoftoolsprovidedbythird-partyvendorstofacilitateanorganization’spresenceontheInternet,andevaluationfactorsthatcanseparatethesuitabilityofoneWeb-hostingvendorfromanotherwithrespecttomeetingyourorganization’soperationalrequirements.

7.1   Rationale

Thereareavarietyoffactorsanorganizationwillconsiderwhendeter-mining if a third-party Web-hosting arrangement should be usedeither inplaceof, or as a supplement to, an in-houseWeb-hostingarrangement. While cost is normally an important consideration,there are also other factors that can have a heavy weight when anorganizationcomparesthedevelopmentofanin-housesystemtotheuseofaWeb-hostingfacility.Table 7.1listssomeofthemoreimpor-tantfactorsanorganizationshouldconsider,witheachfactorhavingtheability,eitheraloneorinconjunctionwithotherfactors,toformtherationaleforobtainingtheuseofaWeb-hostingfacility.

Page 279: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

260 aPraCtiCalguidetoContentdeliverynetworks

ToillustratethevarietyofchoicesyouhaveconcerningobtainingaWeb-hostingprovider,considertheexampleshowninFigure 7.1.Inthisexample,thisauthorusedtheMicrosoftBingWebsitelocatedat www.bing.com to search for Web-hosting providers. Note thatthe referenced figure shows only 10 of 202,000,000 hits. While itis highly unlikely that anyone will have the time or inclination toexamineevena fractionof theresultinghits, thisvividly illustratesthat youhave a significantnumberof choices that you cannarrowdownbyrefiningyoursearch.Forexample,ifyourorganizationwaslocatedinaspecificcityandwantedtobeabletodeliverdiskswithoutbeingatthemercyofthePostalServiceoranexpressorganization,youcouldsearchforWebhostinginthatcity.ChancesareratherhighthatyouwillfindoneormoreWeb-hostingorganizationthatmeetyourcriteria.Thatsaid,let’sexaminethereasonsassociatedwithusingaWeb-hostingfacility.

7.1.1 Cost Elements and Total Cost

Today, most Web-hosting services bill clients based upon severalusage-relatedelements.Thoseelementscanincludeprocessorrequire-ments, disk space requirements, total data transfer, and bandwidthutilization. In addition, if your Web-hosting service provides ane-commerce site that includes the processing of credit cards, yourorganization can expect one or more fees per credit card usage inadditiontoacreditcardprocessingfee.

To ensure a valid comparison between the use of a third-partyWeb-hostingserviceandanin-houseeffort,youneedtoconsidereachandeverycostassociatedwiththesetwooptions.Forexample,onecostcommonlyoverlooked iselectricity,whichmaybesignificantlyreducedwhenyourequiretheuseofathirdparty.Insteadofpayinga

Table 7.1 Rationale for Using a Web-Hosting Facility

Cost elements and total costPerformance elementsServer-side languages supportedWeb service tools availableBack-end database supportFacility location(s)

Page 280: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 261

Figu

re 7

.1

Usin

g a

sear

ch e

ngin

e to

loca

te W

eb-h

ostin

g pr

ovid

ers.

Page 281: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

262 aPraCtiCalguidetoContentdeliverynetworks

directcostforelectricitywhenyourorganizationoperatesanin-houseWebsite,athird-partyWeb-hostingfacilitywillnormallyfactorinthecostofelectricity,buildingmaintenance,andpersonnelintotheirhosting feestructure. Inadditionto theeliminationof thecost forelectricitybyoutsourcingWebhosting,youneedtoconsidertheaddi-tionalcostassociatedwithheatremovalviaachilledorconventionalair-conditioning system.This is especially true in certain locations,suchasinGeorgia,wherethetemperatureduringfourtofivemonthsoftheyearhoversbeyond80degreeswithhighhumidity.Inthiscase,thedirectcostofoperatingaWebserverisonlyaportionofthecostofelectricityduetooperatingtheserver.Becausetheservergeneratesheat,youhavetoconsidertheremovaloftheextraBritishThermalUnits(BTUs)thatresultfromtheoperationoftheserver.

OthercostsassociatedwithoperatingaWebserverthatareoftenoverlooked include training and maintenance when you operate anin-house Web server. Typically, butnot always, the savings associ-atedwithusingathird-partyWeb-hostingservicewillbelessthanthe costs associated with a do-it-yourself approach. However, inspiteofeconomicsfavoringtheuseofaWeb-hostingservice,manymediumtolargeorganizationswillelectthedo-it-yourselfapproach,as this action provides them with a higher degree of control. Thisis especially true for the larger e-commerce merchant Web sites,where prices can change very rapidly for certain merchandise andthe organization needs to exercise full control over when mainte-nanceandbackupoperationsoccur.Inaddition,whenthethird-partyWeb-hostingserviceoperatesasharedhostingfacilitywheretwoormoreorganizations share theuseof a commonhardwareplatform,security considerations need to be examined. Although each orga-nization isprovidedwitha virtualWebserverand ineffectcannotvieworchangetheoperationalparametersoftheothervirtualserversoperating on the common hardware platform, many times organi-zations are very reluctant to use this type of Web service. This isespecially true if one or more organizations on the shared hostingplatformarecompetitors.Forexample,thisauthorhighlydoubtsthatWalmartwouldagreetohaveitsWebservershostedbyathirdpartythatmightbehostingacompetitor,suchasK-Mart.Althoughsecu-ritymightprecludeonecompanyfromdetermininginadvancepricechangespostedbyanothercompany,thereisalwaysthechancethat

Page 282: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 263

thethird-partyprovidercouldmakeamistakethatresultsintheflowofinformationthatshouldnotbedisclosed.Thus,eventhoughcostcanbeanimportantrationaleforusingaWeb-hostingfacility,itmaynotbethegoverningfactor.

7.1.2 Performance Elements

Thecategoryofperformanceelementsisabroadtermthatreferencesthe various factors that govern the operation of the Web-hostingfacility.Such factors can include theoperating systemandapplica-tionsoftwareprovidedbythevendor,thetypeofhardwareplatformtheWebservicewilloperateupon,uptimeguarantees forboth thehardware platform and communications to and from the Internet,the Internet connection or series of connections supported by theWeb-hostingfacility,andthetypeofsupportthefacilityprovides.Inthenextfewparagraphsinthissection,wewilltakeamoredetailedlookateachoftheseperformanceelements.

TheoperatingsystemandapplicationssupportedbyaWeb-hostingfacilitygovernyourorganization’sabilitytohaveaparticulartypeofWebsiteplacedintooperation.Thatis,youwouldnotselectavendorthatcouldnotprovidesupportofon-linecreditcardbilling ifyourorganizationrequiredthatcapability.Similarly,ifyourorganizationisapartssupplierandtheWeb-hostingfacilitywouldneedaninter-face to yourorganization’smainframedatabase, vendors that couldnotsupporttherequiredinterfacewouldnotbeconsidered.Similarly,if your organization previously developed a Web server that oper-atesundertheSunUnixoperatingsystemandyouwerelookingforahostingsiteinAustralia,youwouldmorethanlikelygivepreferencetohostingsitesinthatcountrythatcanprovideSunUNIXplatforms.

ThetypeofhardwareplatformprovidedbytheWeb-hostingfacil-ityinconjunctionwiththeoperatingsystemandapplicationprogramswill govern the capability of the facility to respond tobrowseruserrequests. When examining the hardware platform, an organizationneedstoconsiderrandomaccessmemory(RAM),thenumberofpro-cessorsandtheiroperatingrateanddata-pathwidth,diskcontrollers,anddiskdrives.Inaddition,youneedtoconsidertheuseofavirtualserver versus a dedicated server. If the hosting provider operatesvirtualservers,thismeansthatyourorganizationwillbeoneofmany

Page 283: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

264 aPraCtiCalguidetoContentdeliverynetworks

operatingonacommonhardwareplatform.Whilethismaynotnec-essarilybebad,theuseofavirtualservermorethanlikelywilldependuponthepoliciesofyourorganizationandtheclientsalreadyoper-atingonavirtualserver.Thisauthorrememberstheoldadagethat“Macy’sisnotGimbels”fromatimewhenthetwodepartmentstorescompetedwithoneanother,priortotheriseoftheInternet.Evenifthe latterhadsurvived, it ishighlydoubtful if thetwocompetitorswouldeverhavetheirWebsitesonacommonvirtualserver.

Concerning RAM, you need to consider the current amount ofRAM, whether RAM can be expanded, and, if so, the maximumamountofmemorythatcanbesupported.Whenconsideringthetypeof processor used by the hardware platform, it’s important to notewhether theplatform supportsmore thanoneprocessor and, if so,howmanyarepresentlyinstalled.Doingsowillprovideyouwithanindicationoftheupgradeabilityofthehardwareplatform.Similarly,investigatingthecapacityofdiskdrives,currentstorageused,andtheabilitytoaddadditionaldriveswillprovideyouwithanindicationoftheupgradeabilityofonlinestorage.BecauseaWeb-hostingfacilitytypicallyoperatesnumeroushardwareplatforms,thisoptionmaypro-videmorecapacityandcapabilitythanifyourorganizationacquiredasinglehardwareplatformforoperationatyourfacility.

One of the more important performance elements provided bysome third-partyWeb-hostingorganizations is a veryhighuptimeguarantee. Because the Web-hosting facility can amortize the costofbackupservers,redundantpower,andredundantInternetconnec-tionsovermanyusers,theycanusuallyaffordtospendmoreonmain-tainingaccesstoserversunderadverseconditionsthananindividualorganization can afford. However, not all uptime guarantees refertothesamecomponent.SomeuptimeguaranteesreferenceInternetavailability,withahighuptimeguaranteeusuallyassociatedwithsitesthathavemultipleconnectionstotheInternet.Otheruptimeguaran-teesrefertothehardwareplatform.WhileaWeb-hostingfacilitywillusually provide higher communications and processor uptime thanaffordablebysmall-andmedium-sizedorganizations,itisimportanttoexamineboth.Otherwise,a99.999%processoruptimelevel,whichsoundsreallyexciting,maynotbeallthatgreatifInternetavailabilityisatamuchlower level.Inthissituation,thehardwareplatformisalmostneverdown,whiletheInternetconnectionisproblematic.

Page 284: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 265

Twoadditionalperformanceelementsthatneedtobeconsidered,andwhichusually favor theuseof aWeb-hosting facility, are theirInternetbandwidthandtheirhelpdeskorassistancefacility.Becausethe Web-hosting facility provides a service for many customers,it morethanlikelyprovidesmorebandwidthaswellasmoreInternetconnectionsthanasingleWebsitecouldnormallyafford.Thus,theInternetconnectionsprovidedbyaWeb-hostingfacility,especiallyiftheconnectionsoccurviatwoormoreInternetServiceProviders,canprovideahigherlevelofavailabilitythanmostorganizationscanaffordwhen constructing an in-house Web site. Similarly, a Web-hostingfacilitythatprovidesahostingserviceforoneormoresiteswithheavyinternationaltrafficmayoperatea24/7helpdesk,aluxuryformanysmall- andmedium-sizedorganizations.

To facilitate the reader’s consideration of performance elements,Table 7.2providesachecklistofthoseelements.Althoughtheprover-bialVendorAandVendorBarelistedattheheadoftwocolumns,you

Table 7.2 Comparing Performance Elements

PERFoRMAnCE ELEMEnT VEnDoR A VEnDoR B

Type of server Dedicated _______________ _______________ Virtual _______________ _______________ operating system _______________ _______________Server hardware platform Processor(s) _______________ _______________ RAM _______________ _______________ Disk storage _______________ _______________ RAID level, if required _______________ _______________Server software capability Credit card processing _______________ _______________ Database interface _______________ _______________ Applications required _______________ _______________ Backup server availability _______________ _______________Internet connectivity Primary connection _______________ _______________ Secondary connection _______________ _______________ Diversity routing _______________ _______________Help-desk facility operation 24/7? _______________ _______________ Toll-free support _______________ _______________

Page 285: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

266 aPraCtiCalguidetoContentdeliverynetworks

can compare any vendor against anotheror extend the comparisontoadditionalvendorsbysimplyduplicatingthecolumnstoconsidermorevendors.

7.1.3 Server-Side Language Support

For most organizations considering the use of a Web-hosting ser-vice,thesupportofserver-sidelanguagesmaybemoreinclusivethentheycouldsupportwiththeirexistingpersonnelstructure.However,althoughmostWeb-hostingfacilitiessupportacoresetofserver-sidelanguages, theymore than likelydonot support all such languages.Thismeansthat,whileaWeb-hostingfacilitymayprovidesupportformoreserver-sidelanguagesthanyourorganizationcouldsupportinde-pendently,youstillneedtoexaminethoselanguagesthefacilityvendorsupports.Indoingso,youneedtoascertainwhetherallofthelanguagesyourhostedWebsitewillrequirearesupported,and,ifnot,whetherit’spossibletouseanalternativelanguageorwhetheryourorganiza-tionshouldconsideradifferentWeb-hostingfacilitywhoselanguagesupportbettermatchestherequirementsofyourorganization.

7.1.4 Web-Service Tools

Another reason forconsidering theuseofa third-partyWeb-hostingfacilityconcernstheWeb-servicetoolstheyprovidetocustomers.Thosetoolscanvaryconsiderablybetweenhostingvendors.SomevendorsmayprovideWeb-pageconstructiontools,whileothervendorsmayaddoneormorepromotiontoolstotheirliteral“bagoftools.”Table 7.3providesalistofmajorWeb-hostingfacilitytoolscommonlyofferedaswellasamechanismtocomparethosetoolsagainsttherequirementsofyourorganization.Inaddition,spaceisprovidedforyoutocomparetheoffer-ingsoftwovendorsagainstyourorganization’srequirements.Youcanalsoduplicatethetableasamechanismtocomparethetoolofferingsofadditionalvendorsagainsttherequirementsofyourorganization.

7.1.5 The Importance of Images

InexaminingthelistofcommonWeb-hostingtoolslistedinTable 7.3,a few words need to be mentioned concerning the image formats

Page 286: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 267

supported by a hosting site and how they can possibly affect yourorganization’s monthly cost. Many Web-hosting facilities includeeitheradirectorindirectcostbasedupondownloadsfromaWebsitein termsof thenumber of kilobytes ormegabytes downloadedpermonth.WhilethegoalofmostorganizationsistohaveanattractiveandpopularWebsitewhereahighpercentageofvisitorsareconvertedintopayingcustomers,atthesametimeanothergoalistominimize

Table 7.3 Common Web-Hosting Facility Tools

CATEGoRY/TooLoRGAnIzATIonAL

REQUIREMEnT VEnDoR A VEnDoR B

Site construction tools Web templates _______________ _______________ _______________ HTML editor _______________ _______________ _______________ HTML version _______________ _______________ _______________ CSS support _______________ _______________ _______________ JavaScript _______________ _______________ _______________ ActiveX _______________ _______________ _______________ PHP support _______________ _______________ _______________ XML Support _______________ _______________ _______________ Image formats _______________ _______________ _______________ File manager _______________ _______________ _______________ Blog builder _______________ _______________ _______________ Photo album _______________ _______________ _______________ Calendar _______________ _______________ _______________ Counters _______________ _______________ _______________ URL redirect _______________ _______________ _______________ Guestbook _______________ _______________ _______________ E-mail forum _______________ _______________ _______________ Message forums _______________ _______________ _______________ Password protection _______________ _______________ _______________ Chat room _______________ _______________ _______________ Audio clips _______________ _______________ _______________Headline news tools Today in history _______________ _______________ _______________ news headline feed _______________ _______________ _______________ Sports headline feed _______________ _______________ _______________Promotional tools Daily cartoon _______________ _______________ _______________ E-cards _______________ _______________ _______________ Links _______________ _______________ _______________ Site ring _______________ _______________ _______________ Site searches _______________ _______________ _______________

Page 287: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

268 aPraCtiCalguidetoContentdeliverynetworks

thecostofoperatingtheWebsite.Wemightthinkthatthesegoalsaremutuallyexclusive,buttheyarenot.Wehavethetechnicalabil-itytominimizecostswhileincreasingthedesirabilityofviewingourWebpagesbyoptimizingtheimagessenttobrowserusers.ThekeytominimizingorcontainingcostisobtainedthroughtheappropriateuseofimagesandtherecognitionthattheJPEG(JointPhotographicExperts Group) format can provide a visually pleasing image thatcannotbedifferentiatedfromthesameimagestoredandtransmittedinotherformats(orinJPEGformatsthathavenotbeenoptimized),whichcanimposeasignificantlyhigherburdenonbandwidthcosts.Thiskeyisobtainedbyhavinganimage-manipulationprogramthatcontrolstheQfactorofaJPEGimage.

TheQorqualityfactornormallyrangesonascalefrom1to100,with 100 representing the highest quality and lowest compression.Unfortunately,someprogramshavedevelopedalternativescales,suchasthepopularLeadToolsprogram,whichusesascalerangingfrom1(nocompression,asortoflosslessJPEG)to255(lowestqualityandhighestlevelofcompression).

Todeterminethevalueofcompression,assumethatyourorgani-zationhasaWebsitethatdisplaysanimageofyourcorporatehead-quartersonitshomepagethatwascompressedusingJPEGatadefaultQfactorof80(usedbymanyprograms),resultingin125,000 bytesofdatadownloadedeachtimeyourorganization’shomepageisaccessed.Let’sassumeamoderatelevelofactivitythatresultsin100,000hitspermonth.Thislevelofactivitythenresultsinacumulativedown-loadof12,500,000,000bytespermonthjustfortheimage.Nowlet’sassumethatyouuseanimagetoolofferedbytheWeb-hostingvendorthatallowsyoutomanipulateimagesand,byusingalowerQfactor,obtainanimagethatrequires87,000bytesofdataand,tothenakedeye,looksexactlyliketheotherimage.AssumingthesamenumberofWebhome-pagehits, this results in8,700,000,000bytesdown-loadedjustfortheimage,oradifferenceof3,800,000,000bytespermonth.BecausemanyWeb-hostingorganizationshaveamonthlybillthat includes a communications fee in termsof kilobytes ormega-bytes downloaded per month, by simply modifying the image onyourorganization’shomepage,youareabletoreducetransmissionby3,800,000,000/1024,or3711Mbytespermonth.Atatypicalcostof10centsperMbyte,thesimpleuseofaprogramthatletsyouadjust

Page 288: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 269

theQ factor of JPEG images could result in amonthly savings of$371,monthaftermonthaftermonth.

7.1.6 Back-End Database Support

It’simportanttonotethataback-enddatabaseisaccessedindirectlybyclientsofyourWeb-hostingprovider.Thus,themannerbywhichclientdataisenteredandpassedtoadatabaseisanimportantconsider-ation.Afterall,it’snotatrivialjobtorewritemanythousandsoflinesof code if yourorganizationpreviouslydeveloped itsownWeb sitethatincludedonetypeofback-enddatabaseandnowyouarefacedwithrewritingsoftwaretouseaWeb-hostingproviderthatsupportsadifferentback-enddatabase.Inadditiontodirectlyconsideringthetypeofback-enddatabasesupportedbytheWeb-hostingprovider,itmaybeimportanttoinvestigatethescriptinglanguageyouuseanditsabilitytosupporttheprovider’sback-enddatabase.For example,theuseofsomescriptinglanguagessupportsissuinggenericcommandsthatcouldmanipulatearangeofback-enddatabases.Incomparison,otherscriptinglanguagesmaybemorelimitedinscope.

7.1.7 Facility Location(s)

ThelastmajorfactorthatyoucanconsiderinjustifyingtheuseofaWeb-hostingfacilityisthelocationorseriesoflocationswherethehostingfacilityresides.SomeWeb-hostingorganizationshave severalserverfarms locatedondifferent continents. In comparison,other vendorsmaybelimitedtooperatingasinglefacility.BycarefullyexaminingthelocationorlocationsofWeb-hostingvendors,itbecomespossibleforyourorganizationtousethosefacilitiesasamechanismtomovecontentclosertogroupsofbrowserusersthateithercurrentlyaccessor have the potential to access your site. For example, assume thatyourorganizationoperatesaWebsiteinLos AngelesandyounotedthatalargenumberofpagehitsoccurfrombrowseruserslocatedinWesternEurope.Furtherassumethat throughtheuseofWeb logssupplemented by the use of a network analyzer programmed withappropriatefilters,younoticethatthosebrowserusersareexperienc-ingsignificanttransmissiondelaysthatappeartoresultinahighlevelofserverabandonment.Inthissituation,youmightconsidertheuseof

Page 289: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

270 aPraCtiCalguidetoContentdeliverynetworks

aWeb-hostingfacilitylocatedinWesternEurope,whoseutilizationcouldreducetransmissiondelaysexperiencedbybrowserusersaccess-ingyourorganization’sWebsitefromthatgeneralgeographicarea.

TofacilitatetrafficfromWesternEuropeflowingtothehostedserver,yourorganizationwouldregisteranewdomainnamewithacountrysuffixcorrespondingtothelocationoftheWeb-hostingfacility.Then,youcouldmodifyyourexistingWebserverlocatedinLosAngelestoredirectbrowseruserrequestsoriginatingfromWesternEuropetothehostedserver.

NowthatwehaveanappreciationforthekeyreasonsforexaminingthepotentialuseofaWeb-hostingfacility,let’sturnourattentiontothetypesofhostingfacilitiesavailable.

7.2   Types of Web-Hosting Facilities

ThereareseveralwaysthatwecancategorizethetypeofWeb-hostingfacility.Theseincludetheirabilitytosupporte-commerce,creditcard,and/or PayPal usage; the operating system used (Windows, Unix,Linux);geographiclocation(UnitedStates,Canada,WesternEurope,etc.); or the manner by which outside customers use Web-hostingfacilities.Forthepurposeofthissection,wewillconsiderthetypeofWeb-hostingfacilitytobedefinedbythewayoutsidecustomersuseWeb-hostingfacilities.

There are three basic types of Web hosting you can consider:dedicatedhosting,sharedhosting,andcolocatedhosting.

7.2.1 Dedicated Hosting

Adedicatedhostingfacilitymeansthataserverisdedicatedtooper-ating a single organization’s presence on the Internet. This type ofWebhostingshouldbeconsideredifyourorganizationneedstoruncustomized software and applications instead of standard softwareprovidedbythehostingorganization.Adedicatedserverisalsopref-erableifthereisaneedforahighlevelofsecurityorifyourhostedserverisexpectedtoreceiveahighleveloftrafficthatwarrantstheuseofahardwareplatformfortheexclusiveuseofyourorganization.Becauseonlyoneorganizationusestheresourcesofadedicatedserver,

Page 290: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 271

itscostwillbehigherthanwhentwoormoreorganizationssharetheresourcesofasingleserver,asituationreferredtoasashared server.

7.2.2 Shared Server Hosting

Asharedservermeansthatanorganization’sWebsiteoperatesonahardwareplatformthatoneormoreadditionalWebsitesalsooperateon, in effect sharing the hardware platform. The ability to sharethe use of a common hardware platform is commonly obtained byaWebserverprogramsupportingmultipleWebsites.ExamplesofWebserverprogramsthatprovidethiscapability includeMicrosoftWindows2000Server,Windows2003Server,andApache.

ThisauthorwillturntoMicrosoft’sInternetInformationServices(IIS)toshowhowmultipleserverscanbesupportedonacommonWeb platform. Under IIS, you can right-click on the server name(shownwithanasteriskinthetopoftheleft-handsideofFigure 7.2)establishedaftertheprogramwasinstalled.Thisactionresultsinthedisplayofapop-upmenuthatincludestheoptionNew.SelectingthatoptionallowsyoutodefineanewWebsiteoranewFTPsite.

Figure 7.2 Right-clicking on the server name in the Internet Information Services box enables you to create a new server.

Page 291: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

272 aPraCtiCalguidetoContentdeliverynetworks

ThisactionwillresultinthedisplayofaWeb-sitecreationwizardthatwalksyouthroughtheprocessfordefininganewWebsite.Thewizardwillrequestyoutoenteradescriptionofthesite,itsIP address,andtheportfieldtobeused.Next,youcanenterthedomainnameforthesite.Assumingthatwesimplyenteredthename“site 1”forthenewWebsite,itwouldappearintheIISbox,asshowninFigure 7.3.Thereafter, we could again right click on a Web-site entry to addanothersiteontothecommonhardwareplatform.

7.2.3 Colocated Hosting

Athird typeofWeb-hostingarrangement is referred to as colocated hosting.Underacolocated-hostingsolution,yourorganizationbecomesresponsibleforpurchasingaWebserver.Thatserveristhenlocatedatthethird-partyfacility,whichisthenresponsibleforphysicallyhous-ingtheserveraswellasprovidingpowerandenvironmentalcontrols,security,support,andInternetconnectivity.

Thegoalbehindusingacolocatedhostingarrangementisthatitprovidesanorganizationwiththeabilitytoobtainrequiredhardware

Figure 7.3 Viewing IIS after another Web site was added to the common hardware platform.

Page 292: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 273

thatmaynotbeavailablefromthehostingorganization.Italsopro-videsanopportunityforanorganizationthathasoneormoreexcesshardwareplatformstohavethemusedbythehostingfacilityinsteadofhavingtobebilledforadditionalhardwareusage.Ofcourse,suchan arrangement can createproblems if thehostingorganization isnotfamiliarwiththehardware.Inaddition,selectingacustomizedhardware solution can create backup problems. Thus, the use of acolocatedhostingmethodneedstobecarefullyanalyzedbaseduponthe hardware you select and the hosting organization’s ability tosupportthehardware.

7.3   Evaluation Factors

InconcludingourdiscussionofWeb-hostingoptions,it’simportanttoconsiderthemannerbywhichyouevaluatethisoptionagainsttheofferingsofdifferentvendors.IfyoutypeinthetermWeb hostinginGoogle,Yahoo,oranothersearchengine,youwillbeinundatedwithresponses.Googlealoneprovidesover26millionresponses.

WhenevaluatingthesuitabilityofaWeb-hostingorganization,weneedtoexaminealargenumberofquantifiablemetrics.Wealsoneedanswerstoquestionsthatmayberankablebutarenotquantifiable.A numberofquantifiablemetricswerepreviouslynotedinthischapter.Forexample,Table 7.3includesalistofWeb-hostingfacilitytoolsyoucanconsiderbymatchingyourorganization’srequirementsagainstacomprehensivelistoftools.Table 7.4containsalistofWeb-hostingevaluationfactorsthatincludesareferencetoTable 7.3forcompar-inghosting-facility tools.Similar toTable 7.3,Table 7.4providesalistoffeaturesyoumaywishtoevaluate.Inthesecondcolumn,youcanspecifyyourorganization’srequirementsforaspecificfeature.Inthethirdandfourthcolumns,labeledVendorAandVendorB,youcan compare theofferingsprovidedby two vendors to the featuresrequiredbyyourorganization.Ofcourse,youcanincludeadditionalvendorsinyourassessmentbyaddingcolumnsforVendorsC,D,etc.

AlthoughmostoftheentriesinTable 7.4areself-explanatory,afewwordsofdiscussionarewarrantedwithrespecttoserviceandsupport.Formanyorganizations,it’simportanttoascertainthetypesofexist-ingcustomersitesbeinghostedbyaWeb-hostingfacility.Obtainingthisinformationwillprovideavaluableinsightconcerningtheability

Page 293: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

274 aPraCtiCalguidetoContentdeliverynetworks

Table 7.4 Web-Hosting Evaluation Features

CATEGoRY/FEATUREoRGAnIzATIonAL

REQUIREMEnT VEnDoR A VEnDoR B

Internet connection Bandwidth _______________ _______________ _______________ Type of connection T1 _______________ _______________ _______________ T3 _______________ _______________ _______________ other _______________ _______________ _______________ Redundancy _______________ _______________ _______________ Uptime guarantee _______________ _______________ _______________Type of server Dedicated _______________ _______________ _______________ Shared _______________ _______________ _______________ Cohosted _______________ _______________ _______________Server operating system UnIX _______________ _______________ _______________ LInUX _______________ _______________ _______________ Sun oS _______________ _______________ _______________ Windows 2000 _______________ _______________ _______________ Windows 2003 _______________ _______________ _______________ other _______________ _______________ _______________Server capacity Processor speed _______________ _______________ _______________ number of processors _______________ _______________ _______________ on-line storage _______________ _______________ _______________ Tape storage _______________ _______________ _______________ other _______________ _______________ _______________Server software Apache _______________ _______________ _______________ Microsoft IIS _______________ _______________ _______________ o’Riley _______________ _______________ _______________ other _______________ _______________ _______________Server-side software ASP _______________ _______________ _______________ C++ _______________ _______________ _______________ Jscript _______________ _______________ _______________ Perl _______________ _______________ _______________ PHP _______________ _______________ _______________ Microsoft SQL _______________ _______________ _______________ VBScript _______________ _______________ _______________ other _______________ _______________ _______________

Page 294: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition

weB-hostingoPtions 275

ofthethirdpartytohostyourorganization’sWebsite.Forexample,if youareconsideringahosting facility thatoperatesWebsites forclients who require credit card payments, then the hosting facilityshouldbefamiliarwiththeprocessrequired.However,toensurethatitis,youshouldaskforreferencesandobtaintheopinionsofexist-ingcustomersconcerningtheperformanceofthatfunctionaswellasfunctionsthatmayberequiredinthefuture.

Table 7.4 (continued) Web-Hosting Evaluation Features

CATEGoRY/FEATUREoRGAnIzATIonAL

REQUIREMEnT VEnDoR A VEnDoR B

Server security Certificates _______________ _______________ _______________ Authentication type _______________ _______________ _______________ Firewall _______________ _______________ _______________ Router access lists _______________ _______________ _______________ SSL _______________ _______________ _______________ other _______________ _______________ _______________Server backup Content backup _______________ _______________ _______________ Upon change _______________ _______________ _______________ Daily _______________ _______________ _______________ Weekly _______________ _______________ _______________ other _______________ _______________ _______________ Redundant power _______________ _______________ _______________ Uptime guarantee _______________ _______________ _______________ other _______________ _______________ _______________Web-hosting facility tools

(See Table 7.3)Server statistics Events logged _______________ _______________ _______________ other _______________ _______________ _______________Service and support Types of customers _______________ _______________ _______________ opinions of customers _______________ _______________ _______________ Stability _______________ _______________ _______________ 24/7 technical support _______________ _______________ _______________ Help desk _______________ _______________ _______________ other _______________ _______________ _______________

Page 295: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition
Page 296: Ebooksclub.org a Practical Guide to Content Delivery Networks Second Edition