1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker...

54
1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson and other colleagues at Princeton and UC Berkeley

Transcript of 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker...

Page 1: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

1

The Web

EE 122: Intro to Communication Networks

Fall 2010 (MW 4-5:30 in 101 Barker)

Scott Shenker

TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula

http://inst.eecs.berkeley.edu/~ee122/

Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley

Page 2: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

2

Announcements

• Project #1A and HW #2 due in one week– Bspace and web page were out of synch– Granting you a five day extension on project– But:

Get started on part B soon! Don’t neglect your homework!

• My office hours are on Wednesday– See me after class in classroom– Or walk up the hill to Soda when I do– Will be meeting with TAs when there are no students during OH

But come interrupt me if you want to talk to me…..

Page 3: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

3

Goals of Today’s Lecture

• History of the web– Key players, why successful

• Main ingredients of the Web– URIs, HTML, HTTP

• Key properties of HTTP– Request-response, stateless, and resource meta-data

• Performance of HTTP– Parallel connections, persistent connections, pipelining

• Web infrastructure– Clients, proxies, and servers– Caching vs. replication

Page 4: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

4

The Web – History (I)

• 1945: Vannevar Bush, Memex:

• "a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility"

Vannevar Bush (1890-1974)

Memex

(See http://www.iath.virginia.edu/elab/hfl0051.html)

Page 5: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

5

The Web – History (II)

• 1967, Ted Nelson, Xanadu:– A world-wide publishing network

that would allow information to be stored not as separate files but as connected literature

– Owners of documents would be automatically paid via electronic means for the virtual copying of their documents

• Coined the term “Hypertext”– Influenced research community

Who then missed the web…..

Ted Nelson

Page 6: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

6

The Web – History (III)• Physicist trying to solve real problem

– Distributed access to data

• World Wide Web (WWW): a distributed database of “pages” linked through Hypertext Transport Protocol (HTTP)– First HTTP implementation - 1990

Tim Berners-Lee at CERN

– HTTP/0.9 – 1991 Simple GET command for the Web

– HTTP/1.0 –1992 Client/Server information, simple caching

– HTTP/1.1 - 1996

Tim Berners-Lee

Page 7: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

7

The World Wide Web Project

Page 8: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

Why So Successful?

• What do the web and youtube have in common?– The ability to self-publish

• But the self-publishing mechanisms must be:– Technically easy– Independent (not requiring intricate coordination)– Free

• Being part of a grand idealistic and collaborative endeavor isn’t what people want– People aren’t looking for Nirvana (or even Xanadu)– They want to make their mark, and find something neat 8

Page 9: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

9

Moral of the Story

• Timing is everything– Internet made vision not only possible, but necessary– Can you imagine with web without the Internet?

• Visions are great, but problem solving is better

• The best is the enemy of the good– Particularly when it blocks deployment– Nelson on HTML:

“HTML is precisely what we were trying to PREVENT— ever-breaking links, links going outward only, quotes you can't follow to their origins, no version management, no rights management.”

Page 10: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

10

Components of Web Infrastructure

• Content– Objects

• Clients– Send requests / Receive responses

• Servers– Receive requests / Send responses– Store or generate the responses

• Proxies– Placed between clients and servers

Act as a server for the client, and a client to the server– Provide extra functions

Caching, anonymization, logging, transcoding, filtering access– Explicit or transparent (“interception”)

Page 11: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

Ingredients of Web Implementation

• HTML

• URIs, URLs, URNs

• HTTP

11

Page 12: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

12

HTML

• A Web page has several components– Base HTML file– Referenced objects (e.g., images)

• HyperText Markup Language (HTML)– Representation of hypertext documents in ASCII format– Web browsers interpret HTML when rendering a page– Several functions:

Format text, reference images, embed hyperlinks (HREF)

• Straight-forward to learn– Syntax easy to understand– Authoring programs can auto-generate HTML– Source almost always available

Page 13: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

13

URI: Uniform Resource Identifier

• Uniform Resource Locator (URL)– Provides a means to get the resource–http://www.ietf.org/rfc/rfc3986.txt

• Uniform Resource Name (URN)– Names a resource independent of how to get it–urn:ietf:rfc:3986 is a standard URN for RFC 3986

Page 14: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

Names, locations, etc.

• Who has used a URN? A URL?– Which one solves a real problem?– Which one represents an idealistic vision?

• The dominance of URLs over URNs reflects the lack of a proper naming structure for objects– Naming was central component of our clean slate design

• What properties should a URN have?– Not specify anything about object that can change– Cryptographic information about the associated key– Solves real problems: security and mobility of content 14

Page 15: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

15

URL Syntax

protocol://hostname[:port]/directorypath/resource

protocol http, ftp, https, smtp, rtsp, etc.

hostname FQDN, IP address

port Defaults to protocol’s standard porte.g. http: 80/tcp https: 443/tcp

directory path Hierarchical, often reflecting file system

resource Identifies the desired resource

Can also extend to program executions:http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b

Page 16: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

16

HTTP

• HyperText Transfer Protocol (HTTP)–Client-server protocol for transferring resources

• Important properties:–Request-response protocol–Reliance on a global URI namespace–Resource metadata–Stateless–ASCII format % telnet www.icir.org 80

GET /jdoe/ HTTP/1.0<blank line, i.e., CRLF>

Page 17: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

HTTP and TCP

• What functions does HTTP leave to TCP?

• Would HTTP be harder without layering?

17

Page 18: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

Steps in HTTP Request

• HTTP Client initiates TCP connection to server

• HTTP Client sends HTTP request to server

• HTTP Server responds to request

• HTTP Client receives the request

• TCP connection terminates

How many RTTs for a single request?

18

Page 19: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

19

GET /somedir/page.html HTTP/1.1Host: www.someschool.edu User-agent: Mozilla/4.0Connection: close Accept-language: fr (blank line)

Client-to-Server Communication

• HTTP Request Message– Request line: method, resource, and protocol version– Request headers: provide information or modify request– Body: optional data (e.g., to “POST” data to the server)

request line

header lines

carriage return line feedindicates end of message

NotNot optional

Page 20: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

20

Client-to-Server Communication

• HTTP Request Message– Request line: method, resource, and protocol version– Request headers: provide information or modify request– Body: optional data (e.g., to “POST” data to the server)

• Request methods include:– GET: Return current value of resource, run program, …– HEAD: Return the meta-data associated with a resource– POST: Update resource, provide input to a program, …

• Headers include:– Useful info for the server

e.g. desired language

Page 21: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

21

Server-to-Client Communication

• HTTP Response Message– Status line: protocol version, status code, status phrase– Response headers: provide information– Body: optional data

HTTP/1.1 200 OK Connection closeDate: Thu, 06 Aug 2006 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2006 ... Content-Length: 6821 Content-Type: text/html(blank line) data data data data data ...

status line(protocol, status code,

status phrase)

header lines

datae.g., requested HTML file

Page 22: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

22

Server-to-Client Communication

• HTTP Response Message– Status line: protocol version, status code, status phrase– Response headers: provide information– Body: optional data

• Response code classes– Similar to other ASCII app. protocols like SMTP

Code Class Example

1xx Informational 100 Continue

2xx Success 200 OK

3xx Redirection 304 Not Modified

4xx Client error 404 Not Found

5xx Server error 503 Service Unavailable

Page 23: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

23

Web Server: Generating a Response

• Return a file– URL matches a file (e.g., /www/index.html)– Server returns file as the response– Server generates appropriate response header

• Generate response dynamically– URL triggers a program on the server– Server runs program and sends output to client

• Return meta-data with no body

Page 24: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

24

HTTP Resource Meta-Data

• Meta-data– Info about a resource, stored as a separate entity

• Examples:– Size of a resource, last modification time, etc.– Example: Type of the content

Data format classification (e.g., Content-Type: text/html) Enables browser to automatically launch an appropriate viewer From e-mail’s Multipurpose Internet Mail Extensions (MIME)

• Usage example: Conditional GET Request– Client requests object “If-modified-since”– If unchanged, “HTTP/1.1 304 Not Modified”– No body in the server’s response, only a header

Page 25: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

25

HTTP is Stateless

• Stateless protocol– Each request-response exchange treated independently– Servers not required to retain state

• This is good– Improves scalability on the server-side

Don’t have to retain info across requests Can handle higher rate of requests Order of requests doesn’t matter

• This is also bad– Some applications need persistent state

Need to uniquely identify user or store temporary info e.g., Shopping cart, user preferences/profiles, usage tracking, …

Page 26: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

26

State in a Stateless Protocol:

Cookies• Client-side state maintenance

– Client stores small(?) state on behalf of server– Client sends state in future requests to the server

• Can provide authentication

Request

ResponseSet-Cookie: XYZ

RequestCookie: XYZ

Page 27: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

27

State in a Stateless Protocol:

HTTP Authentication

• Tool to limit access to server documents

• Basic HTTP Authentication– Client can add an Authorization header to GET request

Base64-encoded concatenation of username, a colon, & password

– If client doesn’t provide header, server responds with a 401 Unauthorized and a WWW-Authenticate header Server does not honor request until valid authorization received

– Stateless: Must happen on each request

• Is this secure? Is this security?– No. Authentication is not security, but provides a piece

Page 28: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

28

Security Sneak-Peek: HTTPS

• Transport Layer Security (TLS)– Came after Secure Sockets Layer (SSL)– Shim between App layer (e.g. HTTP) and Transport layer

(e.g. TCP)– Provides authentication and communication privacy

Page 29: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

29

Putting It All Together

• Client-Server– Request-Response

HTTP

– Stateless Get state with cookies

• Content– URI/URL– HTML– Meta-data

Page 30: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

30

Web Browser• Is the client

• Generates HTTP requests– User types URL, clicks a hyperlink or bookmark, clicks “reload” or

“submit”– Automatically downloads embedded images

• Submits the requests (fetches content)– Via one or more HTTP connections

• Presents the response– Parses HTML and renders the Web page– Invokes helper applications (e.g., Acrobat, RealPlayer)

• Maintains cache– Stores recently-viewed objects and ensures freshness

Page 31: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

31

Web Browser History

• 1990, WorldWideWeb, Tim Berners-Lee, NeXT computer

• 1993, Mosaic, Marc Andreessen and Eric Bina

• 1994, Netscape

• 1995, Internet Explorer

• ….

Page 32: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

32

Web Server

• Handle client request:1. Accept a TCP connection2. Read and parse the HTTP request message3. Translate the URI to a resource4. Determine whether the request is authorized5. Generate and transmit the response

• Web site vs. Web server– Web site: one or more Web pages and objects united

to provide the user an experience of a coherent collection

– Web server: program that satisfies client requests for Web resources

Page 33: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

33

Some Important Milestones1989 - 1990

• Tim Berners-Lee & Robert Cailliau propose and coin “WorldWideWeb”

- HTML, HTTP, URI

1991 • Tim Berners-Lee writes first web browser

1993 • Mosaic 1 - Supports images

1994 • Netscape 1 - Multiple connections, cookies, <CENTER>

1996 • CSS introduced

• Netscape 2 & 3

• Internet Explorer

- Separates content from structure

- Frames, JavaScript, mouseover

1997 • HTML 4.0 - Tables, scripting, style sheets, …

1997 - 1999

Boom!

• XML

• Dynamic content: DHTML/W3C DOM

• Commerce

2000 - now

• Web 2.0 (coined 2004)

• Accessibility, mobility, internationalization, voice, media, …

- “The network is the platform”

- Open data, user participation, rich user experience

Page 34: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

34

5 Minute Break

Questions Before We Proceed?

Page 35: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

35

HTTP Performance

Most Web pages have multiple objects (“items”)– e.g., HTML file and a bunch of embedded images

How do you retrieve those objects?

One item at a time

What transport behavior does this remind you of?

Page 36: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

36

Fetch HTTP Items: Stop & Wait

Client Server

Request item 1

Transfer item 1

Request item 2

Transfer item 2

Request item 3

Transfer item 3

Finish; displaypage

Start fetchingpage

Time

≥2 RTTsper

object

Page 37: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

37

Improving HTTP Performance:

Concurrent Requests & Responses

• Use multiple connections in parallel

• Does not necessarily maintain order of responses

• Is this fair?– N parallel connections use bandwidth N times

more aggressively than just one– What’s a reasonable/fair limit as traffic

competes with that of other users?

• Client = Why?

• Server = Why?

• Network = Why?

R1R2 R3

T1

T2 T3

Page 38: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

38

Improving HTTP Performance:

Pipelined Requests & Responses

• Batch requests and responses– Reduce connection overhead– Multiple requests sent in a single

batch– Small items (common) can also

share segments– Maintains order of responses– Item 1 always arrives before item 2

• How is this different from concurrent requests/responses?

• What else could we do to speed things up?

Client Server

Request 1Request 2Request 3

Transfer 1

Transfer 2

Transfer 3

Page 39: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

Improving HTTP Performance:

Persistent Connections

• Enables multiple transfers per connection– Maintain TCP connection across multiple requests– Including transfers subsequent to current page– Client or server can tear down connection

• Performance advantages:– Avoid overhead of connection set-up and tear-down– Allow TCP to learn more accurate RTT estimate– Allow TCP congestion window to increase– i.e., leverage previously discovered bandwidth

• Default in HTTP/1.1 39

Page 40: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

40

Improving HTTP Performance:

Caching• Many clients transfer same information

– Generates redundant server and network load– Clients experience unnecessary latency

Server

Clients

Backbone ISP

ISP-1 ISP-2

Page 41: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

41

Improving HTTP Performance:

Caching: How

• Modifier to GET requests:– If-modified-since – returns “not modified” if

resource not modified since specified time

• Response header:– Expires – how long it’s safe to cache the resource– No-cache – ignore all caches; always get resource

directly from server

Page 42: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

42

Improving HTTP Performance:

Caching: Why

• Motive for placing content closer to client:– User gets better response time– Content providers get happier users

Time is money, really!– Network gets reduced load

• Why does caching work?– Exploits locality of reference

• How well does caching work?– Very well, up to a limit– Large overlap in content– But many unique requests

Page 43: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

43

Improving HTTP Performance:

Caching on the ClientExample: Conditional GET Request

• Return resource only if it has changed at the server– Save server resources!

• How?– Client specifies “if-modified-since” time in request– Server compares this against “last modified” time of desired

resource– Server returns “304 Not Modified” if resource has not changed– …. or a “200 OK” with the latest version otherwise

GET /~ee122/fa07/ HTTP/1.1Host: inst.eecs.berkeley.eduUser-Agent: Mozilla/4.03If-Modified-Since: Sun, 27 Aug 2006 22:25:50 GMT<CRLF>

Request from client to server:

Page 44: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

44

Improving HTTP Performance:

Caching with Reverse Proxies

Cache documents close to server decrease server load

• Typically done by content providers• Only works for static content

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

Page 45: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

45

Improving HTTP Performance:

Caching with Forward ProxiesCache documents close to clients

reduce network traffic and decrease latency

• Typically done by ISPs or corporate LANs

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

Forward proxies

Page 46: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

46

Improving HTTP Performance:

Caching w/ Content Distribution Networks

• Integrate forward and reverse caching functionality– One overlay network (usually) administered by one entity– e.g., Akamai

• Provide document caching– Pull: Direct result of clients’ requests – Push: Expectation of high access rate

• Also do some processing– Handle dynamic web pages– Transcoding

Page 47: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

47

Improving HTTP Performance:

Caching with CDNs (cont.)

Clients

ISP-1

Server

Forward proxies

Backbone ISP

ISP-2

CDN

Page 48: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

48

Improving HTTP Performance:

CDN Example – Akamai

• Akamai creates new domain names for each client content provider.– e.g., a128.g.akamai.net

• The CDN’s DNS servers are authoritative for the new domains

• The client content provider modifies its content so that embedded URLs reference the new domains.– “Akamaize” content– e.g.: http://www.cnn.com/image-of-the-day.gif becomes

http://a128.g.akamai.net/image-of-the-day.gif

Page 49: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

50

Improving HTTP Performance:

Caching and Replication

• Caching (pull)– Replicate content “on demand” after a request– Store the response message locally for future use– Challenges:

May need to verify if the response has changed … and some responses are not cacheable

• Replication (push)– Planned replication of content in multiple locations– Update of resources handled outside of HTTP– Can replicate scripts that create dynamic responses

Page 50: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

51

Hosting: Multiple Sites Per Machine

• Multiple Web sites on a single machine– Hosting company runs the Web server on behalf of

multiple sites (e.g., www.foo.com and www.bar.com)

• Problem: GET /index.html– www.foo.com/index.html or www.bar.com/index.html?

• Solutions:– Multiple server processes on the same machine

Have a separate IP address (or port) for each server

– Include site name in HTTP request Single Web server process with a single IP address Client includes “Host” header (e.g., Host: www.foo.com) Required header with HTTP/1.1

Page 51: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

52

Hosting: Multiple Machines Per Site

• Replicate a popular Web site across multiple machines– Helps to handle the load– Places content closer to clients– Helps when content isn’t cacheable by proxies/CDNs

• Problem: Want to direct client to a particular replica– Why?

Balance load across server replicas Pair clients with nearby servers

• Solution #1: Manual selection by clients– Each replica has its own site name– A Web page lists the replicas (e.g., by name, location)– … and asks clients to click on a hyperlink to pick

Page 52: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

53

Hosting: Multiple Machines Per Site

• Solution #2: single IP address, multiple machines– Run multiple machines behind a single IP address

– Ensure all packets from a single TCP connection go to the same replica

Load Balancer

64.236.16.20

Page 53: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

54

Hosting: Multiple Machines Per Site

• Solution #3: multiple addresses, multiple machines– Same name but different addresses for all of the replicas– Configure DNS server to return different addresses

Internet 64.236.16.20

173.72.54.131

12.1.1.1

Page 54: 1 The Web EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev,

55

Conclusions

• Key ideas underlying the Web– Uniform Resource Locator (URL)– HyperText Markup Language (HTML)– HyperText Transfer Protocol (HTTP)– Browser helper applications based on content type

• Performance implications– Concurrent connections, pipelining, persistent conns.

• Main Web infrastructure components– Clients, servers, proxies, CDNs

• Next lecture: drilling down to the link layer