Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz...

26
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, kubitron}@CS.Berkeley.EDU EECS Department UC Berkeley

description

Goal and Challenges Dynamically choose the number and placement of replicas while satisfying clients’ QoS and servers’ capacity constraints Good performance for update dissemination –Delay- Bandwidth consumption Without global network topology knowledge Scalability for millions of objects, clients and servers Provide content distribution to clients with good Quality of Service (QoS) while retaining efficient and balanced resource consumption of the underlying infrastructure

Transcript of Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz...

Page 1: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Dynamic Replica Placement for Scalable Content Delivery

Yan Chen, Randy H. Katz,John D. Kubiatowicz

{yanchen, randy, kubitron}@CS.Berkeley.EDU

EECS DepartmentUC Berkeley

Page 2: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Motivation Scenario

data plane

network plane

datasource

Web contentserverCDN server

client

replica

Page 3: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Goal and Challenges

• Dynamically choose the number and placement of replicas while satisfying clients’ QoS and servers’ capacity constraints

• Good performance for update dissemination– Delay - Bandwidth consumption

• Without global network topology knowledge• Scalability for millions of objects, clients and

servers

Provide content distribution to clients with good Quality of Service (QoS) while retaining efficient and balanced resourceconsumption of the underlying infrastructure

Page 4: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Outline

• Goal and Challenges• Previous Work• Our Solutions: Dissemination Tree• Peer-to-peer Location Service• Replica Placement and Tree Construction• Evaluation• Conclusion and Future Work

Page 5: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Previous Work• Focused on static replica placement

– Clients’ distributions and access patterns known in advance

– Assume global IP network topology• DNS-redirection based CDNs highly inefficient

– Centralized CDN name server cannot record replica locations

• No inter-domain IP multicast• Application-level multicast (ALM) unscalable

– Root maintains states for all children or handle all “join” requests

Page 6: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Solution: Dissemination Tree• Dynamic replica placement use close-to-minimum number of replicas to satisfy QoS and capacity constraints with local network topology

data plane

network plane

datasource

rootserverserverclient

replica

always updateadaptivecoherence

cache

• Adaptive cache coherence for efficient coherence notification

Tapestry mesh

• Use Tapestry location service to improve the scalability & locality

Page 7: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Peer-to-peer Routing and Location Services

• Properties Needed by d-tree– Distributed, scalable location with guaranteed

success– Search with locality

• P2P Routing and Location Services:– CAN, Chord, Pastry, Tapestry, etc.

• Http://www.cs.berkeley.edu/~ravenben/tapestry

Page 8: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Integrated Replica Placement and d-tree Construction

• Dynamic Replica Placement + Application-level Multicast– Naïve approach - Smart approach

• Static Replica Placement + IP Multicast– Modeled as capacitated facility location problem– Design a greedy algorithm with logN approximation– Optimal case for comparison

• Soft-state Tree Maintenance

Page 9: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

parent candidate

data plane

network plane

c

ssurrogate

Tapestry overlay path

Dynamic Replica Placement: naïve

Page 10: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

data plane

network plane

c

ssurrogate

Tapestry overlay path first placement choice

parent candidate

Dynamic Replica Placement: naïve

Page 11: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Dynamic Replica Placement: smart

parent candidates

• Aggressive search

data plane

network plane

c

s parent

siblingserver child

surrogate

Tapestry overlay path

client child

• Greedy load distribution

Page 12: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Dynamic Replica Placement: smart• Aggressive search • Lazy placement

• Greedy load distribution

data plane

network plane

c

s parent

siblingserver child

surrogate

Tapestry overlay path

client child

first placement choice

parent candidates

Page 13: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Evaluation Methodology• Network Topology

– 5000-node network with GT-ITM transit-stub model– 500 d-tree server nodes, 4500 clients join in random order

• Network Simulator– NS-like packet-level priority-queue based event simulator

• Dissemination Tree Server Deployment– Random d-tree– Backbone d-tree (choose backbone routers and subnet gateways

first)• Constraints

– 50 ms latency bound and 200 clients/server load bound

Page 14: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Performance of Dynamic Replica Placement

• Compare Four Approaches– Overlay dynamic naïve placement (dynamic_naïve)– Overlay dynamic smart placement (dynamic_smart)– Static placement on overlay network (overlay_static)– Static placement on IP network (IP_static)

• Metrics– Number of replicas deployed and load distribution– Dissemination multicast performance– Tree construction traffic

Page 15: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Number of Replicas Deployed and Load Distribution

• Overlay_smart uses much less replicas than overlay_naïve and very close to IP_static• Overlay_smart has better load distribution than od_naïve, overlay_static and very close to IP_static

Page 16: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Multicast Performance

• 85% of overlay_smart Relative Delay Penalty (RDP) less than 4

• Bandwidth consumed by overlay_smart is very close to IP_static and much less than overlay_naive

Page 17: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Tree Construction TrafficIncluding “join” requests, “ping” messages, replica

placement and parent/child registration

• Overlay_smart consumes three to four times of traffic than overlay_naïve, and the traffic of overlay_naïve is quite close to IP_static• Far less frequent event than update dissemination

Page 18: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Conclusions and Future Work• Dissemination Tree: dynamic Content Distribution

Network on top of a peer-to-peer location service– Dynamic replica placement satisfy QoS and capacity

constraints and self-organize into an app-level multicast tree– Use Tapestry to improve the scalability and locality

• Simulation Results Show– Close to optimal number of replicas, good load distribution,

low multicast delay and bandwidth penalty at the price of reasonable construction traffic

• Future Work– Evaluate with more diverse topologies and workload– Dynamic replica deletion/migration to adapt to the shift of

users’ interests

Page 19: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Routing in Detail

5712

0880

3210

7510

4510

Neighbor MapFor “5712” (Octal)

Routing Levels1234

xxx15712

xxx0

xxx3xxx4xxx5xxx6xxx7

xx025712xx22xx32xx42xx52xx62xx72

x012x112x212x312x412x512x6125712

07121712271237124712571267127712

5712 0 1 2 3 4 5 6 7

0880 0 1 2 3 4 5 6 7

3210 0 1 2 3 4 5 6 7

4510 0 1 2 3 4 5 6 7

7510 0 1 2 3 4 5 6 7

Example: Octal digits, 212 namespace, 5712 7510

Page 20: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Dynamic Replica Placement• Client c sends “join” request to statistically closest server

s with object o through nearest representative server rs• Naïve placement only checks s before placing new

replicas while smart algorithm in addition checks parent, free siblings and free server children of s– Remaining capacity info piggybacked in soft-state messages

• If unsatisfied, s place new replica on one of the Tapestry path nodes from c to s (path piggybacked in the request)

• Naïve one always chooses the closest qualified node (to c), while smart one puts on the furthest qualified node

Page 21: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

• Without Capacity Constraints: Minimal Set Cover– Most cost-effective method: Greedy algorithm [Grossman &

Wool 1994]• With Capacity Constraints: Variant of Capacitated

Facility Location Problem– C demand locations, S locations to build facilities, the capacity

installed at each location is an integer multiple of u– Facility building cost f_i, service cost c_ij– Objective function: minimize sum of f_i and c_ij– Mapped to our problem: f_i = 1 and c_ij = 0 if location i cover

demand j and infinity otherwise

Static Replica Placement

Page 22: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Static Replica Placement Solutions• Best theoretical one: use Primal-dual schema and

Lagrangian relaxation [Jain & Varizani 1999] – Approximation ratio 4

• Variant of greedy algorithm– Approximation ratio ln|S|– Choose s with the largest value of min(cardinality |

C_s|, remaining capacity RC_s)– If RC_s < |C_s|, choose those least-covered clients to

cover first• With Global IP topology vs. with Tapestry

overlay path topology only

Page 23: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Soft State Tree Maintenance

• Bi-directional Messaging– Heartbeat message downstream & refresh

message upstream• Scalability

– Each member only maintains states for direct children+parent

– “Join” request can be handled by any member• Fault-tolerance through “rejoining”

Page 24: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Performance Under Various Conditions

• With 100, 1000 or 4500 clients• With or without load constraints• Od_smart performs consistently

better than od_naïve and close to IP_s as before

• Load constraint can avoid hot spot

Page 25: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Performance Under Various Conditions II

• With 2500 random d-tree servers• Merit of scalability: maximal number of server children

is very small compared with total number of servers– Due to the randomized and “search with locality” properties

of Tapestry

Page 26: Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

data plane

network plane

data sourcereplica

root serverserverclient

Tapestry mesh

c

s parentsibling

server child

surrogate

cache

Tapestry overlay path

client child

first placement choice

parent candidates