CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

94
CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1

Transcript of CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Page 1: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

CS433/533Computer Networks

Lecture 12

CDN

2/16/2012

1

Page 2: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Admin

Programming assignment 1 status

2

Page 3: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Recap: High-Performance Network Servers

Avoid blocking (so that we can reach bottleneck throughput) Introduce threads

Limit unlimited thread overhead Thread pool, async io

Coordinating data access synchronization (lock, synchronized)

Coordinating behavior: avoid busy-wait Wait/notify; FSM

Extensibility/robustness Language support/Design for interfaces

3

Page 4: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Recap: Operational Laws

Utilization law: U = XS Forced flow law: Xi = Vi X Bottleneck device: largest Di = Vi Si Little’s Law: Qi = Xi Ri Bottleneck analysis:

4

},min{)(max

1ZDN

DNX

},max{)( max ZNDDNR

Page 5: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Recap: Why Multiple Servers?

Scalability beyond single server capability and geolocation of a

single server

Redundancy and fault tolerance Administration/maintenance (e.g., incremental upgrade) Redundancy (e.g., to handle failures)

System/software architecture Resources may be naturally distributed at different

machines (e.g., run a single copy of a database server due to single license; access to resource from third party)

Security (e.g., front end, business logic, and database)

5

Page 6: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Recap: Load Direction: Basic Architecture

Four components Server state

monitoring• Load (incl. failed or

not); what requests it can serve

Path properties between clients and servers

• E.g., Bw, delay, loss, network cost

Server selection alg.• Alg. to choose site(s) and

server(s)

Server direction mechanism

• Inform/direct a client to chosen server(s)

6

InternetInternet

Client

Site A Site B

?

Page 7: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Recap: Load Direction

7

server state

path propertybetween

servers/clients

serverselectionalgorithm

specificrequest ofa client

notifyclient

about selection(direction

mechanism)

Page 8: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Basic Direction Mechanisms

Implicit IP anycast

• Same IP address shared by multiple servers and announced at different parts of the Internet. Network directs different clients to different servers (e.g., Limelight)

Load balancer (smart switch) indirection Reverse proxy

Explicit Mirror/server listing: client is given a list of candidate

DNS names DNS name resolution gives a list of server addresses A single server IP address may be a virtual IP address

for a cluster of physical servers (smart switch)

8

Page 9: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Direction Mechanisms are Often Combined

9

DNS name1

IP1 IP2 IPn

Cluster1in US East

Cluster2in US West

Load balancer

Load balancer

proxy

Cluster2in Europe

Load balancer

Load balancer

servers

DNS name2

Page 10: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example: Netflix

10

Page 11: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example: Netflix Manifest File

11

Client player authenticate and then downloads manifest file from servers at Amazon Cloud

Page 12: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example: Netflix Manifest File

12

Page 13: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example: wikipedia architecture

13http://wikitech.wikimedia.org/images/8/81/Bergsma_-_Wikimedia_architecture_-_2007.pdf

Page 14: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

DNS Indirection and Rotation

14

157.166.226.25

router

DNS serverfor cnn.com

IP addressof cnn.com

157.166.226.25157.166.226.26

IP addressof cnn.com

157.166.226.26157.166.226.25

157.166.226.26

157.166.255.18

Page 15: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example: Amazon Elastic Load Balancing

Use the elb-create-lb command to create an Elastic Load Balancer.

Use the elb-register-instances-with-lb command to register the Amazon EC2 instances that you want to load balance with the Elastic Load Balancer.

Elastic Load Balancing automatically checks the health of your load balancing Amazon EC2 instances. You can optionally customize the health checks by using the elb-configure-healthcheck command.

Traffic to the DNS name provided by the Elastic Load Balancer is automatically distributed across your load balanced, healthy Amazon EC2 instances.

15http://aws.amazon.com/documentation/elasticloadbalancing/

Page 16: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Details: Step 11. Call CreateLoadBalancer with the following parameters:

AvailabilityZones = us-east-1a Listeners

• Protocol = HTTP• InstancePort = 8080• LoadBalancerPort = 80• LoadBalancerName = MyLoadBalancer

The operation returns the DNS name of your LoadBalancer. You can then map that to any other domain name (such as www.mywebsite.com) using a CNAME or some other technique.

PROMPT> elb-create-lb MyLoadBalancer --headers --listener "lb-port=80,instance-port=8080,protocol=HTTP" --availability-zones us-east-1a

Result:DNS-NAME DNS-NAME DNS-NAME MyLoadBalancer-2111276808.us-east-1.elb.amazonaws.com

16http://docs.amazonwebservices.com/ElasticLoadBalancing/latest/DeveloperGuide/

Page 17: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Details: Step 2

2. Call ConfigureHealthCheck with the following parameters: LoadBalancerName = MyLoadBalancer Target = http:8080/ping

• NoteMake sure your instances respond to /ping on port 8080 with an HTTP 200 status code.

Interval = 30 Timeout = 3 HealthyThreshold = 2 UnhealthyThreshold = 2

PROMPT> elb-configure-healthcheck MyLoadBalancer --headers --target "HTTP:8080/ping" --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

Result:HEALTH-CHECK TARGET INTERVAL TIMEOUT HEALTHY-THRESHOLD UNHEALTHY-THRESHOLDHEALTH-CHECK HTTP:8080/ping 30 3 2 2

17

Page 18: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Details: Step 3

3. Call RegisterInstancesWithLoadBalancer with the following parameters: LoadBalancerName = MyLoadBalancer Instances = [ i-4f8cf126, i-0bb7ca62 ]

PROMPT> elb-register-instances-with-lb MyLoadBalancer --headers --instances i-4f8cf126,i-0bb7ca62

Result:INSTANCE INSTANCE-ID INSTANCE i-4f8cf126 INSTANCE i-0bb7ca62

18

Page 19: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Discussion

Advantages and disadvantages of using DNS

19

Page 20: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Clustering with VIP: Basic Idea Clients get a single service IP address,

called virtual IP address (VIP) A virtual server (also referred to as load

balancer, vserver or smart switch) listens at VIP address and port

A virtual server is bound to a number of physical servers running in a server farm

A client sends a request to the virtual server, which in turn selects a physical server in the server farm and directs this request to the selected physical server

20

Page 21: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

VIP Clustering

server array

Clients

L4: TCPL7: HTTP

SSLetc.

Goalsserver load balancingfailure detectionaccess control filteringpriorities/QoSrequest localitytransparent caching smart

switch

virtual IP addresses

(VIPs)

What to switch/filter on?L3 source IP and/or VIPL4 (TCP) ports etc.L7 URLs and/or cookiesL7 SSL session IDs

Page 22: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Big Picture

22

Page 23: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Load Balancer (LB): Basic Structure

23

LBClient

Server1

Server2

Server3

Problem of the basic structure?

VIP

RIP1

RIP2

RIP3

D=VIPS=client

Page 24: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Problem

Client to server packet has VIP as destination address, but real servers use RIPso if LB just forwards the packet from client to a real

server, the real server may drop the packeto Reply from real server to client has real server IP as

source -> client will drop the packet

24

Real Server TCP socket space

state: listeningaddress: {*.6789, *:*}completed connection queue: C1; C2 sendbuf:recvbuf:

state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}

sendbuf: recvbuf:

state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}

sendbuf:recvbuf:

D=VIPS=client

Page 25: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Solution 1: Network Address Translation (NAT)

LB does rewriting/translation

Thus, the LB is similar to a typical NAT gateway with an additional scheduling function

25

Load Balancer

Page 26: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example Virtual Server via NAT

Page 27: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/NAT Flow

27

Page 28: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/NAT Flow

28

Page 29: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

SLB/NAT Flow: Details

1. When a user accesses a virtual service provided by the server cluster, a request packet destined for the virtual IP address (the IP address to accept requests for virtual service) arrives at the load balancer.

2. The load balancer examines the packet's destination address and port number. If they match a virtual service in the virtual server rule table, a real server is selected from the cluster by a scheduling algorithm and the connection is added to hash table that records connections. Then, the destination address and the port of the packet are rewritten to those of the selected server, and the packet is forwarded to the server. When an incoming packet belongs to an established connection, the connection can be found in the hash table and the packet is rewritten and forwarded to the right server.

3. The request is processed by one of the physical servers. 4. When response packets come back, the load balancer

rewrites the source address and port of the packets to those of the virtual service. When a connection terminates or timeouts, the connection record is removed from the hash table.

5. A reply is sent back to the user.

29

Page 30: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/NAT Advantages and Disadvantages Advantages:

o Only one public IP address is needed for the load balancer; real servers can use private IP addresses

o Real servers need no change and are not aware of load balancing

Problemo The load balancer must on the critical patho The load balancer may become the bottleneck

due to load to rewrite request and response packets

• Typically, rewriting responses has a lot more load because there are typically a lot more response packets

Page 31: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB with Direct Reply

31

LBClient

Server1

Server2

Server3

Direct reply

VIP

VIP

Each real server uses VIP as its IP address

Page 32: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/DR Architecture

load balancer

Connectedby a single

switch

Page 33: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Why IP Address Matters?

Each network interface card listens to an assigned MAC address A router is configured with the range of IP addresses connected

to each interface (NIC) To send to a device with a given IP, the router needs to translate

IP to MAC (device) address The translation is done by the Address Resolution Protocol (ARP)

33

VIP

Page 34: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

34

ARP Protocol

ARP is “plug-and-play”:o nodes create their ARP tables without

intervention from net administrator

A broadcast protocol: o Router broadcasts query frame, containing

queried IP address • all machines on LAN receive ARP query

o Node with queried IP receives ARP frame, replies its MAC address

Page 35: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

ARP in Action

35

VIP

- Router broadcasts ARP broadcast query: who has VIP?

- ARP reply from LB: I have VIP; my MAC is MACLB

- Data packet from R to LB: destination MAC = MACLB

Router R

D=VIPS=client

Page 36: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/DR Problem

36

VIP VIP VIPVIP

ARP and race condition:• When router R gets a packet with dest. address VIP, it broadcasts an Address Resolution Protocol (ARP) request: who has VIP?• One of the real servers may reply before load balancer

Solution: configure real servers to not respond to ARP request

Router R

Page 37: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB via Direct Routing

The virtual IP address is shared by real servers and the load balancer.

Each real server has a non-ARPing, loopback alias interface configured with the virtual IP address, and the load balancer has an interface configured with the virtual IP address to accept incoming packets.

The workflow of LB/DR is similar to that of LB/NAT: o the load balancer directly routes a packet to the selected

server • the load balancer simply changes the MAC address of the data frame to

that of the server and retransmits it on the LAN (how to know the real server’s MAC?)

o When the server receives the forwarded packet, the server determines that the packet is for the address on its loopback alias interface, processes the request, and finally returns the result directly to the user

Page 38: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

LB/DR Advantages and Disadvantages Advantages:

o Real servers send response packets to clients directly, avoiding LB as bottleneck

Disadvantages:o Servers must have non-arp alias interfaceo The load balancer and server must have

one of their interfaces in the same LAN segment

Page 39: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Example Implementation of LB

An example open source implementation is Linux virtual server (linux-vs.org)

• Used by– www.linux.com

– sourceforge.net

– wikipedia.org

• More details on ARP problem: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.arp_problem.html

o Many commercial LB servers from F5, Cisco, …

More details please read chapter 2 of Load Balancing Servers, Firewalls, and Caches

39

Page 40: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Question to Think About

How do you test if Amazon ELB uses LB/NAT or LB/DR?

40

Page 41: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Discussion: Problem of theLoad Balancer Architecture

41

LBClient

Server1

Server2

Server3

A major remaining problem is that the LB becomes a single point of failure (SPOF).

VIPD=VIPS=client

Page 42: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Solutions

Redundant load balancerso E.g., two load balancers

Fully distributed load balancingo e.g., Microsoft Network Load Balancing

(NLB)

42

Page 43: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Microsoft NLB

No dedicated load balancer All servers in the cluster receive all packets All servers within the cluster simultaneously run

a mapping algorithm to determine which server should handle the packet. Those servers not required to service the packet simply discard it.

Mapping (ranking) algorithm: computing the “winning” server according to host priorities, multicast or unicast mode, port rules, affinity, load percentage distribution, client IP address, client port number, other internal load information

43

http://technet.microsoft.com/en-us/library/cc739506%28WS.10%29.aspx

Page 44: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Discussion

Compare the design of using Load Balancer vs Microsoft NLB

44

Page 45: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

45

Forward Cache/Proxy

Web caches/proxy placed at entrance of an ISP

Client sends all http requests to web cache if object at web

cache, web cache immediately returns object in http response

else requests object from origin server, then returns http response to client

client

Proxyserver

client

http request

http re

quest

http response

http re

sponse

http re

quest

http re

sponse

http requesthttp response

origin server

origin server

Page 46: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Forward Web Proxy/Cache

Web caches give good performance because very often a single client

repeatedly accesses the same document

a nearby client also accesses the same document

Cache Hit ratio increases logarithmically with number of users

46

app. server

C0

client 1

client 2 client

3

ISP cache

client 4

client 5

client 6

ISP cache

Page 47: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

47

Benefits of Forward Web Caching

Assume: cache is “close” to client (e.g., in same network)

smaller response time: cache “closer” to client

decrease traffic to distant servers link out of

institutional/local ISP network often bottleneck

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache

Page 48: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

What Went Wrong with Forward Web Caches? Web protocols evolved extensively to

accommodate caching, e.g. HTTP 1.1 However, Web caching was developed with a

strong ISP perspective, leaving content providers out of the picture It is the ISP who places a cache and controls it ISPs only interest to use Web caches is to reduce

bandwidth

48

Page 49: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Outline

Load direction/distributiono Basic load direction mechanismso Path properties (to be covered later)o Case studies: Content Distribution

Networks, Akamai, and YouTube

49

Page 50: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Content Distribution Networks Content Distribution Networks (CDNs)

provide examples of Internet-scale load distribution for content publishers

CDN Design Perspective Performance scalability (high throughput,

going beyond single server throughput) and Geographic scalability (low propagation

latency, going to close-by servers) Low cost operation

50

Page 51: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai

Akamai – original and largest commercial CDN operates around 91,000 servers in over 1,000 networks in 70 countries

Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students

Akamai evolution:o Files/streaming (our focus at this moment)o Secure pages and whole pageso Dynamic page assembly at the edge (ESI)o Distributed applications

51

Page 52: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Scalability Bottleneck

52See Akamai 2009 investor analysts meeting

Page 53: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Basic of Akamai Architecture

Content publisher (e.g., CNN, NYTimes)o provides base HTML documentso runs origin server(s)

Akamai runs o edge servers for hosting content

• Deep deployment into 1000 networks

o customized DNS redirection servers to select edge servers based on

• closeness to client browser• server load

53

Page 54: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Linking to Akamai

Originally, URL Akamaization of embedded content: e.g.,<IMG SRC= http://www.provider.com/image.gif >

changed to <IMGSRC = http://a661. g.akamai.net/hash/image.gif>

URL Akamaization is becoming obsolete and supported mostly for legacy reasonso Currently most content publishers prefer to use

DNS CNAME to link to Akamai servers• a CNAME is an alias

54

Note that this DNS redirection unit is per customer, not individual files.

Page 55: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

55

Akamai Load Direction Flow

Internet

Web client

Hierarchy of CDN DNS servers

Customer DNS servers

(1)

(2)

(3)

(4)

(5)(6)

LDNSClient requests site

Client gets CNAME entry with domain name in Akamai

Client is given 2 nearby web replica servers (fault tolerance)

Web replica serversMultiple redirections to find nearby edge servers

More details see “Global hosting system”: FT Leighton, DM Lewin – US Patent 6,108,703, 2000.

Page 56: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Exercise: Zoo machine

Check any web page of New York Times and find a page with an image

Find the URL Use %dig +trace +recurse to see Akamai load direction

56

Page 57: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Load Direction

57

Page 58: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Load Redirection Framework

58

edge servers

clients

If the directed edge server does not have requested content,the edge server goes back to the original server (source) .

Page 59: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Load Direction Formulation: Input Potential related input:

o p(m, e): path properties (from a client site m to an edge sever e)

• Akamai might use a one-hop detour routing (see akamai-detour.pdf)

o akm: request arrival rate from client site m to

publisher ko uk: service rate for requests for publisher k

o xe: load on edge server eo caching state of a server e

59

Page 60: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Load Direction Formulation

Details of Akamai algorithms are proprietary

So what we discuss is our formulation and the measurements of some researchers.

60

akYale ak

ATT1

Edge servers

Request client sites

Page 61: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Load Direction: Control Parameters Control interval: T Mapping from client

sites to serverso server pool Sm

k(t): the pool of edge servers that can be assigned to client site m to publisher k, at time t

61

akYale ak

ATT1

Page 62: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Load Direction: Comments An algorithm (column 12 of Akamai Patent) at

a local DNS servero Compute the load to each publisher k (called serial

number)o Sort the publishers from increasing loado For each publisher, associate a list of random servers

generated by a hash functiono Assign the publisher to the first server that does not

overload

We can formulate more complex versions

62

Page 63: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

63

Experimental Study of Akamai Load Balancing Methodology

o 2-months long measuremento 140 PlanetLab nodes (clients)

• 50 US and Canada, 35 Europe, 18 Asia, 8 South America, the rest randomly scattered

o Every 20 sec, each client queries an appropriate CNAME for Yahoo, CNN, Fox News, NY Times, etc.

Akamai Low-LevelDNS Server

AkamaiWeb replica 1 Akamai

Web replica 2

AkamaiWeb replica 3

.……

Web client

See http://www.aqualab.cs.northwestern.edu/publications/Ajsu06DBA.pdf

Page 64: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Client 2: Purdue

Web

rep

lica

IDs

06/1/05 16:16

64

Server Pool: to Yahoo

day

night

Client 1: Berkeley

Web

rep

lica

IDs

Target: a943.x.a.yimg.com (Yahoo)

Page 65: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Server Pool (to Yahoo)

65

Page 66: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

66

Server Pool: Multiple Akamai Hosted Sites

Num

ber

of A

kam

ai W

eb R

eplic

as

Clients

Page 67: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

67

Load Balancing Dynamics

Berkeley Brazil

Korea

Page 68: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

68

Redirection Effectiveness: Measurement Methodology

Planet Lab Node

Akamai Low-LevelDNS Server

9 Best Akamai Replica Servers………

ping

ping ping ping

Page 69: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

69

Do redirections reveal network conditions?

Rank = r1+r2-1o 16 means perfect correlation o 0 means poor correlation

Brazil is poor

MIT and Amsterdam are excellent

Page 70: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

70

Server Diversity for Yahoo

Good overlay-to-CDN mapping candidates

Majority of PL nodessee between 10 and 50 Akamai edge-servers

Nodes far away from Akamaihot-spots

Page 71: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Streaming Architecture

71

A content publisher (e.g., a radio or a TV station) encodes streams and transfer themto entry points

When a user watches a stream from an edge server, the server subscribes to a reflector

Group a set of streams (e.g., some popular some not) into a bucket called a portset. A set ofreflectors will distribute a given portset.

Compare with Web architecture.

Page 72: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Streaming: Resource Naming

Each unique stream is encoded by an URL called Akamai Resource Locator (ARL)

72

mms://a1897.l3072828839.c30728.g.lm.akamaistream.net/D/1897/30728/v0001/reflector:28839

Windows

media player

portsetStream ID

LiveMediaservice

Customer# (NBA)

Page 73: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Akamai Streaming Load Direction

From ARL to edge servero Similar to web direction

From edge server to reflectoro if (stream is active) then forward to clientelse if (VoD) then fetch from original serverelse using Akamai DNS to query portset+region code

73

Page 74: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Streaming Redirection Interval

- 40% use 30 sec- 10% does not have any redirection (default edge server

cluster in Boston 72.246.103.0/24 and 72.247.145.0/24)

74

Page 75: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Overlapping of Servers

75

Page 76: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Testing Akamai Streaming Load Balancing

(a) Add 7 probing machines to the same edge server(b) Observe slow down(c) Notice that Akamai removed the edge server from DNS;

probing machines stop 76

Page 77: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

You Tube

02/2005: Founded by Chad Hurley, Steve Chen and Jawed Karim, who were all early employees of PayPal.

10/2005: First round of funding ($11.5 M) 03/2006: 30 M video views/day 07/2006: 100 M video views/day 11/2006: acquired by Google 10/2009: Chad Hurley announced in a blog

that YouTube serving well over 1 B video views/day (avg = 11,574 video views /sec )

77

http://video.google.com/videoplay?docid=-6304964351441328559#

Page 78: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Pre-Google Team Size

2 Sysadmins 2 Scalability software architects 2 feature developers 2 network engineers 1 DBA 0 chefs

78

Page 79: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube Design Flow

while (true){ identify_and_fix_bottlenecks(); drink(); sleep(); notice_new_bottleneck();}

79

Page 80: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube Major Components

Web servers

Video servers

Thumbnail servers

Database serverso Will cover the social networking/database

bottleneck/consistency issues later in the course

80

Page 81: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube: Web Servers

Components Netscaler load balancer;

Apache; Python App Servers; Databases

Python Web code (CPU) is not

bottleneck JIT to C to speedup C extensions Pre-generate HTML responses

Development speed more important

81

NetScaler

Apache

PythonApp Server

Webservers

Databases

Page 82: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube: Video Server

82

See “Statistics and Social Network of YouTube Videos”, 2008.

Page 83: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube: Video Popularity

83

See “Statistics and Social Network of YouTube Videos”, 2008.

Page 84: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube: Video Popularity

84

See “Statistics and Social Network of YouTube Videos”, 2008.

How to designa system to handle highly skewed distribution?

Page 85: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube: Video Server Architecture

Tiered architectureo CDN servers (for popular videos)

• Low delay; mostly in-memory operation

o YouTube servers (not popular 1-20 per day)

85

Request

CDN

Most popular

Others

YouTubeColo 1

YouTubeColo N

Page 86: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube Redirection Architecture

86

YouTube servers

Page 87: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

YouTube Video Servers

Each video hosted by a mini-cluster consisting of multiple machines

Video servers use the lighttpd web server for video transmission: Apache had too much overhead (used in the first few

months and then dropped)

Async io: uses epoll to wait on multiple fds

Switched from single process to multiple process configuration to handle more connections

87

Page 88: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Thumbnail Servers

Thumbnails are served by a few machines

Problems running thumbnail serverso A high number of requests/sec as web

pages can display 60 thumbnails on pageo Serving a lot of small objects implies

• lots of disk seeks and problems with file systems inode and page caches

• may ran into per directory file limit

• Solution: storage switched to Google BigTable (we will cover this later)

88

Page 89: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Thumbnail Server Software Architecture Design 1: Squid in front of Apache

o Problems• Squid worked for a while, but as load increased

performance eventually decreased: Went from 300 requests/second to 20

• under high loads Apache performed badly, changed to lighttpd

Design 2: lighttpd by default (By default lighttpd uses a single thread)o Problem: often stalled due to I/O

Design 3: switched to multiple processes contending on shared accepto Problems: high contention overhead/individual

caches89

Page 90: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Thumbnails Server: lighttpd/aio

90

Page 91: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Discussion: Problems of Traditional Content Distribution

91

app. server

C0

client 1

client 2

client 3

client n

DNS

Page 92: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

92

Objectives of P2P

Share the resources (storage and bandwidth) of individual clients to improve scalability/robustness

Bypass DNS to find clients with resources! examples: instant

messaging, skype

Internet

Page 93: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

P2P

But P2P is not new

Original Internet was a p2p system: The original ARPANET connected UCLA,

Stanford Research Institute, UCSB, and Univ. of Utah

No DNS or routing infrastructure, just connected by phone lines

Computers also served as routers

P2P is simply an iteration of scalable distributed systems

Page 94: CS433/533 Computer Networks Lecture 12 CDN 2/16/2012 1.

Backup Slides

94