Download - George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos Dimitrios Katsaros

Transcript
Page 1: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 1

March 15, 2004

George PallisAthena Vakali

Konstantinos StamosAntonis Sidiropoulos

Dimitrios KatsarosYannis Manolopoulos

Programming Languages & Software Engineering Lab

Department of InformaticsAristotle Univ. of Thessaloniki, Greece

A Latency-based Object Placement Approach in Content Distribution Networks

http://www.csd.auth.gr/~oswindshttp://www.csd.auth.gr/~oswinds

Page 2: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 2

March 15, 2004

INTRODUCTION

The Problem• Congested lines, obsolete backbones, multimedia content,

increasing user populations Great Internet Traffic JamSolutions• Increasing Bandwidth• Web Caching

– temporary storage of objects closer to the consumer • Web Prefetching

– the process of predicting future requests for Web objects and bringing those objects into the cache in the background, before an explicit request is made for them

• Content Distribution Networks (CDNs) – moving the content to the “edge” of the Internet, closer to the

end-user

Page 3: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 3

March 15, 2004

Content Distribution Network (CDN)

• A CDN (such as Akamai, Mirror Image etc.) is a network of cache servers, called surrogate servers, owned by the same Internet Service Provider (ISP) that delivers content to users on behalf of content providers.

• Surrogate servers are typically shared, delivering content belonging to multiple Web sites.

• The networking functional components of a CDN include user redirection services, distribution services, accounting and billing

Page 4: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 4

March 15, 2004

Content Distribution Network (2)

Page 5: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 5

March 15, 2004

CDN Schemes

• Uncooperative pull-based– the clients' requests are directed to their closest surrogate server– CDNs do not always choose the optimal server from which to

serve the content • Cooperative pull-based

– the surrogate servers are cooperating with each other in case of cache misses

– the surrogate servers find nearby copies of requested objects, and store them in their caches

• Cooperative push-based– the content is prefetched to the surrogate servers– the surrogate servers cooperate in order to reduce the replication

and update cost

Page 6: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 6

March 15, 2004

CDNs Challenging Issues

• Replica/Surrogate server placement problem – where should be located the surrogate servers?

• Content selection problem – which content should be outsourced?

• Content replication problem – which surrogate servers should replicate the

outsourced content?

Page 7: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 7

March 15, 2004

Motivation

• We study the Content Replication Problem– NP-Complete– Existing heuristic methods for optimally replicating the

“outsourced content” in surrogate servers over a CDN• Random

– Naive, unscalable approach

• Popularity– Requires popularity statistics (e.g. users traffic)

• Greedy-single– Requires popularity statistics, huge memory requirements

• Greedy-global– Requires popularity statistics, huge memory requirements

Page 8: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 8

March 15, 2004

Contribution

• We formulate the content replication problem for a cooperative push-based scheme.

• We provide a novel, self-tuning, parameterless strategy for optimally placing outsourced objects in CDN’s surrogate servers, which is based on network latency.

• We develop an analytic simulation environment to test the efficiency of the proposed latency-based scheme. – Using real and synthetically generated test data, we show the

robustness and efficiency of the proposed method which can reap performance benefits better than an analogous heuristic method which has a priori knowledge of the object popularity statistics.

Page 9: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 9

March 15, 2004

Problem Formulation• The content replication problem is to select the

optimal placement x such that it minimizes

• Dik(x) is the “distance” to a replica of object k from surrogate server i under the placement x• the distance reflects the latency (the elapsed time

between when a user issues a request and when it receives the response)

• N is the number of surrogate servers, K is the number of outsourced objects, λi is the request rate for surrogate server i, and pκ is the probability that a client will request the object k.

N

i

K

kikN

jj

ik xDp

xt1 1

1

)()(cos

Page 10: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 10

March 15, 2004

The Lat-cdn Algorithm

• Main idea– place the outsourced objects to

surrogate servers with respect to the total network’s latency, without taking into account the objects’ popularity

– Latency measures the users’ satisfaction and it should be as small as possible

Page 11: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 11

March 15, 2004

The Lat-cdn Algorithm: The Flowchart

For each outsourced object, we find which is the best surrogate server in order to place it (produces

the minimum network latency)

We select from all the pairs of outsoursed object – surrogate server

that have been occurred in the previous step, the one which produces the largest network latency (max Dik(x)), and thus place this object to that

surrogate server

Surrogate servers become

full?

No

Yes

All the “outsourced objects” are stored in the origin server and all the CDN’s surrogate servers

are empty

The final Placement

CDN Infrastructureoutsourced

objects

Page 12: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 12

March 15, 2004

The Lat-cdn Algorithm: The PseudocodeLat-cdn{Input: obj[1…K] //outsourced objectsss[1…N] //surrogate serversOutput: a placement x of outsourced objects to surrogate serverswhile (there is free cache space on surrogate servers){ for (k=1; k<=K; k++)

{ min[obj[k]]=; for (n=1; n<=N; n++) if (free cache size of ss[n] <= size obj[k] && obj[k] does not exist in ss[n]) { place obj[k] to ss[n]; find the cost(obj[k],ss[n]); if (cost(obj[k],ss[n])<min[obj[k]]) //find the minimum cost min[obj[k]]=cost(obj[k],ss[n]); } } for (k=1; k<=K; k++) find the maximum of min[obj[k]]; placement (object y, surrogate server z); //place the object y to surrogate server z which has the maximum

value of minimum costs. } }

Page 13: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 13

March 15, 2004

Simulation Testbed

• We use trace-driven simulations developing an analytic simulation environment:– a system model simulating the CDN infrastructure

– a network topology generator

– a Web site generator, modeling file sizes, linkage, etc.

– a client request stream generator capturing the main characteristics of Web users' behavior

Page 14: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 14

March 15, 2004

System Model

• We have implemented a simulation model for CDNs using the ParaSol library– CDN networking issues are computed dynamically via the

simulation model– Provides an implementation as close as possible to the

working TCP/IP protocol– Uses Bloom filters for memory-efficient CDN simulation

• We consider a CDN infrastructure consisting of 20 surrogate servers– All the surrogate servers have the same storage capacity

Page 15: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 15

March 15, 2004

Network Topology

• Using the GT-ITM internetwork topology generator, we generate a random network topology, called Waxman, with a total of 1008 nodes

• Using BGP routing data collected from a set of 7 geographically-dispersed BGP peers in April 2000, we construct an AS-level Internet topology with a total of 3037 nodes

Page 16: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 16

March 15, 2004

Web Site Generation

• Using the R-MAT tool, we construct Web graphs– The R-MAT produces realistic Web graphs capturing

the essence of each graph in only a few parameters

• We create two graphs with varying number of nodes (objects)– sparse-density graph (4000 nodes)

– moderate-density graph (3000 nodes)

Page 17: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 17

March 15, 2004

Request Streams Generation

• Using a requests’ generator, we generate clients’ transactions

• Given a Web site graph, we generate transactions as sequences of page traversals (random walks) upon the site graph

Page 18: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 18

March 15, 2004

Performance Evaluation

• Examined Methods– Random: Assigns the outsourced objects to CDN’s

surrogate servers randomly subjected to the storage constraints. Both the outsourced object and the surrogate server are selected by uniform probability.

– Popularity: Each surrogate server stores the most popular outsourced objects among its clients. The node sorts the objects in decreasing order of popularity and stores as many outsourced objects in this order as the storage constraint allows.

Page 19: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 19

March 15, 2004

Lat-cdn for Typical Object Sizes

• Average Response Time for Moderate-density Web Graphs (3000 objects)

AS Network Topology - Moderate Graph

0

0,05

0,1

0,15

0,2

0,25

1% 3% 5% 10%

Cache Size

Avg

. R

esp

on

se T

ime

Popularity

Random

Lat-cdn

Waxman Network Topology - Moderate Graph

0

0,05

0,1

0,15

0,2

0,25

0,3

1% 3% 5% 10%

Cache SizeA

vg.

Res

po

nse

Tim

e

Popularity

Random

Lat-cdn

The size of the cache is expressed in terms of the percentage of the total number of bytes of the Web site

As the cache size increases, the average response time also increases since the larger in size caches satisfy more requests

Page 20: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 20

March 15, 2004

Lat-cdn for Typical Object Sizes (2)

• Average Response Time for Sparse-density Web Graphs (4000 objects)

AS Network Topology - Sparse Graph

0

0,05

0,1

0,15

0,2

0,25

0,3

1% 3% 5% 10%

Cache Size

Avg

. R

esp

on

se T

ime

Popularity

Random

Lat-cdn

Waxman Network Topology - Sparse Graph

0

0,05

0,1

0,15

0,2

0,25

0,3

1% 3% 5% 10%

Cache Size

Avg

. R

esp

on

e T

ime

Popularity

Random

Lat-cdn

The difference in performance between Lat-cdn and the other two heuristics is quite significant (ranges from 6% to 25%)

Page 21: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 21

March 15, 2004

Lat-cdn Limitations• Average Response Time for Real Web Site

(Stanford Web site - 281903 Web objects)

AS Network Topology - Stanford Web site

0,06

0,061

0,062

0,063

0,064

0,065

0,1% 0,3% 0,5% 1%

Cache Size

Avg

. R

esp

on

e T

ime

Popularity

Random

Lat-cdn

We use a different scale for the cache sizes (compared with the previous ones) due to the large amount of objects of the Stanford Web site

The response times are too small because the majority of objects of Stanford Web site have very small sizes

Page 22: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 22

March 15, 2004

Conclusion

• We addressed the content replication problem for CDNs

• The proposed heuristic algorithm makes no use of any request statistics in determining in which surrogate servers to place the outsourced objects

• For the future we plan to investigate the content replication problem in CDNs for uncooperative pull-based schemes as well as for cooperative pull-based schemes

Page 23: George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos  Dimitrios Katsaros

3rd Latin American Web Congress (LA-WEB 2005) 25

March 15, 2004

Thank you Thank you for your for your

attentionattention