Overlay Multicast Mechanism
Student : Jia-Hui Huang
Adviser : Kai-Wei Ke
Date : 2006/5/9
2
Outline
Introduction Topology-Aware Grouping End system multicast Simulation Summary
3
Introduction
IP multicast Drawback Require router to maintain per-group state Reliability, congestion control, flow control more
difficulty Overlay multicast
Build an overlay multicast tree on top IP layer Unicast data along tree links Application level multicast
4
Overlay multicast mechanism
Topology-Aware Grouping (TAG) End system multicast (ESM)
Narada
5
Outline
Introduction Topology-Aware Grouping End system multicast Simulation Summary
6
TAG(1/2)
Exploits underlying network topology information
Use path overlap among member to reduces Delay Link Stress
TAG node maintain IP and paths for parent and children – Family table (FT)
7
TAG(2/2)
Definition A path from node A to node B
The spath of A where S is the root of the tree
Length of a path or is the number of routers in the path
if is a prefix of where s is the root of the tree
( , )P A B
( , )P S A
P ( )len P
A B ( , )P S A ( , )P S B
8
Complete path matching(1/2)
Like longest prefix match Algorithm consider three mutually exclusive
conditions Select a node A such that A is child of C Select children of C No child of C satisfying 1 or 2
N : new member
C : the node being examined
A N
iA iN A
9
Complete path matching(2/2)
Recursive algorithm until condition 2 or 3 is meet
Tree management Member join Member leave Fault resilience
Parent and children periodically exchange messages Child failure : discards the child from it’s FT Parent failure : rejoin
10
Outline
Introduction Topology-Aware Grouping End system multicast Simulation Summary
11
ESM
Shift multicast feature to end system Group membership Multicast routing Packet duplication
Using a self-organizing and fully distributed algorithm Narada algorithm
Two steps of Narada algorithm Construct a mesh Construct per-source spanning tree for mesh
12
ESM Concept
Link Stress (Si): number of identical copies of a packet carried by a physical link
Distance (di) Resource usage
l
lii sd
1
*
A
B D
C
R1 R125
1
2 1
1
A
B D
C27
27
3 2
28
28
IP Multicast Resource Usage : 30IP Unicast Resource Usage : 57End System Multicast Resource Usage : 32
Complete virtual graph
13
Narada Design (1/2)
objectives of Narada algorithm Self-organizing Overlay efficiency Self-improving
Narada algorithm Group Management Mesh Performance Data delivery
14
Narada Design (2/2)
Two steps of algorithm Group management functions are
abstracted out and handled at the mesh
Distributed heuristics for repairing mesh partition
We may leverage standard routing algorithms for construction of data delivery trees
Mesh
Tree
15
Group management (1/5)
Distributed manage membership Every member maintain a list of other members in
the group List need update when join, leave or fail
Refresh message mechanism Each member periodically generate a refresh
message with sequence number Dissemination refresh message along the mesh
16
Group management (2/5)
Member i keeps track of the information for every other member k in the group Member address k Last sequence number Time of first receive
Reduce overhead of refresh message Each member periodically exchange its knowledge of
membership with neighbors
kiS
kiS
17
Group management (3/5)
Three operation of group management Member join Member leave and failure Repairing mesh partitions
Member join process It assume can get a some member list Random select member from list to send join message The join message request added as a neighbor of that
member Repeat process until successful join the group Refresh message mechanism to obtain group info.
18
Group management (4/5)
Member leave and failure Member must notifies its neighbors before leave Leave information will propagated to the rest of group
members Abrupt
Detected by neighbors when stop receive refresh Propagate information to other members
Ex of failure if node c fail
E
F
B A
D
G
C
19
Group management (5/5)
Repairing mesh partitions Member failure may cause partition Each member maintain a queue that stopped rece
ive refresh message for at least time Periodically run a scheduling algorithm to probe a
nd delete member from head of queue
mT
20
Mesh performance (1/3)
The constructed mesh can be suboptimal because Random selection neighbor when join Link add in partition repair my not useful in long
time Underlying network conditions may vary
Using utility mechanism to add or drop link dynamically and improve quality
21
Mesh performance (2/3)
Utility function depends on the what kind of performance metric specific Example latency and bandwidth ( conferencing
application ) Addition of links
Every member periodically probe some random members that is not neighbor
And evaluate the utility of adding a link to this member
Determine if add link by a given threshold
22
Mesh performance (3/3)
Dropping of links Every member periodically computes the cost of
its link to every neighbor using the cost algorithm The cost of a link between I and j in I’s perception
is the number of group members for which I use j as next hop
Picks the lowest cost link and drops it if it falls below threshold
23
Data delivery
The per-source trees constructed from the reverse shortest path between each recipient and source
24
Outline
Introduction Topology-Aware Grouping End system multicast Simulation Summary
25
Simulation (1/2)
Properties of simulation topology Power-law
Larger number of low-degree routers than high-degree routers
Small-world Avg. shortest distance between two randomly chosen
nodes is approximately six hops
26
Simulation (2/2)
Property of constructed overlay tree High-degree high-bandwidth router more likely
traversed by links near the source Simulation metrics
Number of hops vs. overlay tree level Relative delay penalty (RDP) Longest Latency Mean Bandwidth
27
Number of hops vs. overlay tree level
Number of hops decreases as the host level increases
28
Relative delay penalty (RDP)
ESM < MDDBST < TAG
29
Longest Latency
Latency & RDP for ESM decrease as more hosts join Lower latency paths become availableESM > TAG > MDDBST
30
Mean Bandwidth
Trade-off between latency and bottleneck bandwidthMDDBST > TAG > ESM
31
Outline
Introduction Topology-Aware Grouping End system multicast Simulation Summary
32
Summary
Both delay and number of hops between parent and child decrease as the level increase
Balance the trade-off between delay and bandwidth
33
Reference
Yang-hua Chu, Sanjay G. Rao, Srinivasan Seashan, and Hui Zhang, “A Case for End System Multicast,” IEEE Journal On Selected Areas In Communications, VOL. 20 ISSUE 8, Oct. 2002, pp. 1456-1471
Sherlia Y. Shi, Jonathan S. Turner and Marcel Waldvogel, “Dimensioning Server Access Bandwidth and Multicast Routing in Overlay Networks,” Proceedings of NOSSDAV 2001.
Minseok Kwon and Sonia Fahmy, “Topology-Aware Overlay Networks for Group Communication,” Proceedings of NOSSDAV'02, May 2002.
Minseok Kwon and Sonia Fahmy, “Characterizing Overlay Multicast Networks,” IEEE International Conference on Network Protocols, pp. 61
34
Outline
Introduction Dimensioning server multicast routing Topology-Aware Grouping End system multicast Simulation Summary
35
Dimensioning server multicast routing(1/2) Use AMcast network architecture
Deploy application servers on the networks Spawn a start topology from each server to its end
users End users send/receive exactly one copy of
packet Work shifted from source to all servers
Design routing algorithms from two objectives
36
Dimensioning server multicast routing(2/2)
Delay Optimization Minimum diameter, degree-bounded spanning
tree (MDDBST) Load balancing
Bounded diameter, residual-balanced spanning tree (BDRBST)
Two objectives are orthogonal
37
MDDBST(1/4)
Definition
given
G=(V,E) : undirected complete graph
: degree bound
: cost for edge e
Find
A spanning tree T of G for each and
degree of v satisfies
diameter (the cost of the longest simple path) of T
is minimized
max ( )d v
( )c e
v Tmax( ) ( )Td v d v
( )dia T
38
MDDBST(2/4)
Start from single node
Select smallest node
Add node u to Tree T
Update nodes in tree T
Update nodes not in tree T
All node in tree T
All node has corresponding
tree
complete
A
A
N
Y
N
Y
Longest path of u to any other nodes in T
( )u
39
MDDBST(3/4)
max ( ) 3d v
{ }W A
( ) 0A A
B E
DC
12
9
8
3
764
5 10
A
B C D E( ) 2B ( ) 1E ( ) 7D ( ) 6C
( ) 1A
A
B C D
E
( ) 4B
( ) 1E
( ) 9D ( ) 10C
A
B
E
DC
{{ , }}L E A{ , }W A E { , , }W A E B {{ , },{ , }}L E A B A
( ) 2A
( ) 10C ( ) 9D
( ) 4B
( ) 4E
40
MDDBST(4/4)
( ) 6A
( ) 9E
( ) 4B
A
B
E
DC( ) 10C
( ) 9D
{ , , , }W A E B C {{ , },{ , },{ , }}L E A B A C B
( ) 7A
( ) 10E
( ) 5B
A
B
E
DC( ) 10C
( ) 9D
{ , , , , }W A E B C D
{{ , },{ , },{ , },{ , }}L E A B A C B D B
4
1
109
41
BDRBST(1/3) Definition given G=(V,E) : undirected complete graph : degree bound : cost for edge e B : cost Bound Find A spanning tree T of G for each and degree of v satisfies diameter (the cost of the longest simple path) of T
< B and maximize (residual bandwidth)
max ( )d v
( )c e
v Tmax( ) ( )Td v d v
( )dia Tmaxmin( ( ) ( ))Td v d v
42
BDRBST(2/3)
Introduce balance factor M Algorithm similar MDDBST Main difference
Select a set of M smallest nodes Select the largest residual bandwidth (smallest
degree) node as parent node Special cases
M=1 : algorithm same as MDDBST M= # of servers : only considers load balancing
43
BDRBST(3/3)
Increase system capacity by increase end-to-end delay
Small values of M provide good load balance while still meeting the diameter bound
44
AMcast architecture
45
MDDBST algorithm
46
Family table (FT)
FT
Parent <IP(A),P(S,A)>
Children <IP(Bi), P(S,Bi)>
…..
.
47
Topology aware definition
S
R1
D1
D5D3
D4
D2 R5
R3
R2
R4
Path from S to D5 ( spath of D5 )
( , 5) 1, 2, 4P S D R R R
( ( , 5)) 3len P S D
1 5D D
48
Path match condition
S
C
A
N
Path match
S
C
A1 A2 A3
S
C
A1 A2 A3
N
Path match
S
C
A2A1
S
C
A2A1N
Path match
Condition 1
Condition 3Condition 2
49
Complete path match algorithm
50
CPM Member join
Root
Member1
Member2
Path Matching
New Member
Join
Request/Reply
Request/Reply
Req
uest
/Rep
ly
…..
CPM Join process
S
R1
R2
R4
R3
D5
D2 D3
D4
D1
FT
D1 : (R1)FT
D2 : (R1,R2,R4)
FT
D3 : (R1,R2,R4)
FT
D4 : (R1,R2)
FT
D2 : (R1,R2,R4)
FT
D2 : (R1,R2,R4)
D5 : (R1,R2,R3)
51
CPM Member leave
Send LEAVE message Parent remove entry Parent add entry
S
R1
R2
R4
R3
D5
D2 D3
D4
D1
FT
D1 : (R1)
FT
D3 : (R1,R2,R4)
FT
D4 : (R1,R2)
FT
D2 : (R1,R2,R4)
D5 : (R1,R2,R3)
FT
D2 : (R1,R2,R4)
D5 : (R1,R2,R3)
52
Partial path matching process
S
R1
R2
R4
R3
D2
D3
D1
R5
R6
FT
D1 : (R1,R2)
D2 : (R1,R3,R4)
Bwthresh = 100kbps
D1-D3 : 300kbps
D2-D3 : 50kbps
S
R1
R2
R4
R3
D2
D3
D1
R5
R6
FT
D2 : (R1,R3,R4)
D4 : (R1,R3,R7)
D5 : (R1,R3,R2)
R7
D4
D5
FT
D1 : (R1)
FT
D3 : (R1,R3,R5,R6)
FT
D2 : (R1,R3,R4)
D4 : (R1,R3,R7)
Bwthresh = 100kbps
D1-D3 : 50kbps
D2-D3 : 600kbps
D4-D3 : 80kbps
Join process Leave process
53
Scheduling algorithm
Time exceed T
According probability
54
Utility function
Latency as metric
55
Addition of links
Berk1
Stan2CMU
Gatech1
Stan1
Gatech2
Probe
Berk1
Stan2CMU
Gatech1
Stan1
Gatech2Probe
Delay improves to Stan1, CMU
but marginally.
Do not add link!
Delay improves to CMU, Gatech1
and significantly.
Add link!
56
Cost algorithm
57
Dropping of links
Gatech1Berk1
Stan2 Stan1
Gatech2
Gatech1Berk1
Stan2CMU
Stan1
Gatech2
Used by Berk1 to reach only Gatech2 and vice versa: Drop!!
Top Related