Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:
-
Upload
amberlynn-mcgee -
Category
Documents
-
view
214 -
download
0
Transcript of Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:
Peer-to-Peer Peer-to-Peer SystemsSystems
27/10 & 10/11 – 2006
INF5071 – Performance in distributed systems:
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Client-Server
backbonenetwork
local distribution
network
local distribution
network
local distribution
network
Traditional distributed computing Successful architecture, and will
continue to be so (adding proxy servers) Tremendous engineering necessary to
make server farms scalable and robust
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Distribution with proxies
Hierarchical distribution system E.g. proxy caches that
consider popularity
Popular videos replicated and kept close to clients
Unpopular ones close to the root servers
Popular videos are replicated more frequently end-systems
local servers
root servers
regionalservers
completeness of available content
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Peer-to-Peer (P2P)
backbonenetwork
local distribution
network
local distribution
network
local distribution
network
Really an old idea - a distributed system architecture No centralized control Nodes are symmetric in function
Typically, many nodes, but unreliable and heterogeneous
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Overlay networks
LAN
backbone
network
backbone
networkbackbon
enetwork
LAN
LANLAN
IP routing
IP link
IP path
Overlay node
Overlay link
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
P2P Many aspects similar to proxy caches
Nodes act as clients and servers Distributed storage Bring content closer to clients Storage limitation of each node Number of copies often related to content popularity Necessary to make replication and de-replication decisions Redirection
But No distinguished roles No generic hierarchical relationship
At most hierarchy per data item Clients do not know where the content is
May need a discovery protocol All clients may act as roots (origin servers) Members of the P2P network come and go
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
P2P Systems Peer-to-peer systems
New considerations for distribution systems
Considered here Scalability, fairness, load balancing Content location Failure resilience Routing
Application layer routing Content routing Request routing
Not considered here Copyright Privacy Trading
Examples: Napster
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Napster
Program for sharing (music) files over the Internet
Approach taken Central index Distributed storage and download All downloads are shared
P2P aspects Client nodes act also as file servers
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Napster
Client connects to Napster with login and password
Transmits current listing of shared files
Napster registers username, maps username to IP address and records song list
central index
join
...
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Napster
central index
query
answer
...
Client sends song request to Napster server
Napster checks song database
Returns matched songs with usernames and IP addresses (plus extra stats)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Napster
central index
get
file
...
User selects a song, download request sent straight to user
Machine contacted if available
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Napster: Assessment Scalability, fairness, load balancing
Replication to querying nodes Number of copies increases with popularity
Large distributed storage Unavailability of files with low popularity Network topology is not accounted for at all Latency may be increased
Content location Simple, centralized search/location mechanism Can only query by index terms
Failure resilience No dependencies among normal peers Index server as single point of failure
Examples: Gnutella
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella Program for sharing files over the Internet
Approach taken Purely P2P, centralized nothing Dynamically built overlay network Query for content by overlay broadcast No index maintenance
P2P aspects Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture
Many iterations to fix poor initial design (lack of scalability)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Joining Connect to one
known host and send a broadcast ping
Can be any host, hosts transmitted through word-of-mouth or host-caches
Use overlay broadcast ping through network with TTL of 7
TTL 1TTL 2TTL 3TTL 4
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Joining Hosts that are not
overwhelmed respond with a routed pong
Gnutella caches these IP addresses or replyingnodes as neighbors
In the example the grey nodes do not respond within a certain amount of time (they are overloaded)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Query
Query by broadcasting in the overlay
Send query to all overlay neighbors
Overlay neighbors forward query to all their neighbors
Up to 7 layers deep (TTL 7)
qu
ery
query
query
query
query
query
query
query
query
TTL:7
TTL:6
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Query Send routed
responses
To the overlay node that was the source of the broadcast query
Querying client receives several responses
User receives a list of files that matched the query and a corresponding IP address
query
responsequeryresponse
query
response
quer
yre
spon
se
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Transfer File transfer
Using direct communication
File transfer protocol not part of the Gnutella specification
do
wn
load
req
ues
tre
qu
este
dfi
le
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Gnutella: Assessment Scalability, fairness, load balancing
Replication to querying nodes Number of copies increases with popularity
Large distributed storage Unavailability of files with low popularity Bad scalability, uses flooding approach Network topology is not accounted for at all, latency may be
increased
Content location No limits to query formulation Less popular files may be outside TTL
Failure resilience No single point of failure Many known neighbors Assumes quite stable relationships
Examples: Freenet
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet Program for sharing files over the Internet
Focus on anonymity
Approach taken Purely P2P, centralized nothing Dynamically built overlay network Query for content by hashed query and best-first-search Caching of hash values and content Content forwarding in the overlay
P2P aspects Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture Anonymity
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet: Nodes and Data
Nodes Routing tables
Contain IP addresses of other nodes and the hash values they hold (resp. held)
Data is indexed with a hash values “Identifiers” are hashed Identifiers may be keywords, author ids, or the
content itself Secure Hash Algorithm (SHA-1) produces a “one-
way” 160-bit key Content-hash key (CHK) = SHA-1(content)
Typically stores blocks
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet: Storing and Retrieving Data
Storing Data Data is moved to a server with arithmetically close keys
1. The key and data are sent to the local node2. The key and data is forwarded to the node with the nearest keyRepeat 2 until maximum number of hops is reached
Retrieving data Best First Search
1. An identifier is hashed into a key2. The key is sent to the local node3. If data is not in local store, the request is forwarded to the best
neighborRepeat 3 with next best neighbor until data found, or request times
out4. If data is found, or hop-count reaches zero, return the data or error
along the chain of nodes (if data found, intermediary nodes create entries in their routing tables)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet: Best First Search
Heuristics for Selecting Direction>RES: Returned most results<TIME: Shortest satisfaction time<HOPS: Min hops for results>MSG: Sent us most messages (all types)<QLEN: Shortest queue<LAT: Shortest latency>DEG: Highest degree
query?...
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet: Routing Algorithm
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Freenet: Assessment Scalability, fairness, load balancing
Caching in the overlay network Access latency decreases with popularity
Large distributed storage Fast removal of files with low popularity
A lot of storage wasted on highly popular files Network topology is not accounted for
Content location Search by hash key: limited ways to formulate queries Content placement changes to fit search pattern Less popular files may be outside TTL
Failure resilience No single point of failure
Examples: FastTrack, Morpheus, OpenFT
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
FastTrack, Morpheus, OpenFT
Peer-to-peer file sharing protocol
Three different nodes
USER Normal nodes
SEARCH Keep an index of “their” normal nodes Answer search requests
INDEX Keep an index of search nodes Redistribute search requests
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
FastTrack, Morpheus, OpenFT
INDEX
SEARCH
USER
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
FastTrack, Morpheus, OpenFT
INDEX
SEARCH
USER
?
!!
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
FastTrack, Morpheus, OpenFT: Assessment
Scalability, fairness, load balancing Large distributed storage Avoids broadcasts Load concentrated on super nodes (index and search) Network topology is partially accounted for Efficient structure development
Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Can only query by index terms
Failure resilience No single point of failure but overlay networks of index servers (and
search servers) reduces resilience Relies on very stable relationship / Content is registered at search nodes Relies on a partially static infrastructure
Examples: BitTorrent
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Distributed download system
Content is distributed in segments
Tracker One central download server per content Approach to fairness (tit-for-tat) per content
No approach for finding the tracker
No content transfer protocol included
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
Segment download operation
Tracker tells peer source and number of segment to get
Peer retrieves content in pull mode
Peer reports availability of new segment to tracker
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
Rarest first strategy
No second input stream:
not contributed enough
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
All nodes: max 2 concurrent streams in and outNo second input stream:
not contributed enough
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent
Tracker
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
BitTorrent Assessment Scalability, fairness, load balancing
Large distributed storage Avoids broadcasts Transfer content segments rather than complete content Does not rely on clients staying online after download
completion Contributors are allowed to download more
Content location Central server approach
Failure resilience Tracker is single point of failure Content holders can lie
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Comparison
Napster Gnutella FreeNet FastTrack BitTorrent
ScalabilityLimited by flooding
Uses caching
Separate overlays per
file
Routing informati
onOne central
serverNeighbour list Index server One tracker
per file
Lookup cost O(1) O(log(#nodes
))O(#nodes) O(1) O(#blocks)
Physical locality
By search server
assignment
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Comparison
Napster Gnutella FreeNet FastTrack BitTorrent
Load balancing Many replicas of popular
content
Content placement changes to fit search
Load concentrate
d on supernodes
Rarest first copying
Content location All files
reachableUnpopular files may be
outside TTL
All files reachableSearch by
hash
External issue
Uses index server
Search by index term
Uses flooding
Search by hash
Failure resilience
Index server as single point of failure
No single point of failureOverlay
network of index
servers
Tracker as single point
of failure
Peer-to-Peer Peer-to-Peer SystemsSystems
Distributed directories
Examples: Chord
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Chord
Approach taken Only concerned with efficient indexing Distributed index - decentralized lookup service Inspired by consistent hashing: SHA-1 hash Content handling is an external problem entirely
No relation to content No included replication or caching
P2P aspects Every node must maintain keys Adaptive to membership changes Client nodes act also as file servers
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
keyhash
function
hash table
pos
0
1
2
3
..
..
..
N
y z
Hashbucket
lookup(key) dataInsert(key, data)
Lookup Based on Hash Tables
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Nodes are the hash buckets Key identifies data uniquely DHT balances keys and data across nodes
Distributed Hash Tables (DHTs)
Distributed application
Distributed hash tables
Lookup (key) data
node ….
Insert(key, data)
node node node
Define a useful key nearness metric Keep the hop count small Keep the routing tables “right size” Stay robust despite rapid changes in membership
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Chord IDs & Consistent Hashing m bit identifier space for both keys and nodes
Key identifier = SHA-1(key)
Node identifier = SHA-1(IP address)
Both are uniformly distributed
Identifiers ordered in a circle modulo 2m
A key is mapped to the first node whose id is equal to or follows the key id
Key=“LetItBe” ID=54SHA-1
IP=“198.10.10.1” ID=123SHA-1
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Routing: Everyone-Knows-Everyone
Every node knows of every other node - requires global information
Routing tables are large – N
Hash(“LetItBe”) = K54
Where is “LetItBe”? “N56 has K60”
Requires O(1) hops
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Hash(“LetItBe”) = K54
Where is “LetItBe”?
Routing: All Know Their Successor Every node only knows its successor in the ring
Small routing table – 1
Requires O(N) hops
“N56 has K60”
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Routing: “Finger Tables” Every node knows m other nodes in the ring Increase distance exponentially Finger i points to successor of n+2i
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Joining the Ring Three step process:
Initialize all fingers of new node - by asking another node for help Update fingers of existing nodes Transfer keys from successor to new node
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Handling Failures Failure of nodes might cause incorrect lookup
N80 doesn’t know correct successor, so lookup fails One approach: successor lists
Each node knows r immediate successors After failure find first known live successor Increased routing table size
N120
N113
N102
N80
N85
N10
Lookup(90)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Chord Assessment Scalability, fairness, load balancing
Large distributed index Logarithmic search effort Network topology is not accounted for Routing tables contain log(#nodes) Quick lookup in large systems, low variation in lookup costs
Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Log(#nodes) lookup steps Not restricted to file location
Failure resilience No single point of failure Not in basic approach
Successor lists allow use of neighbors to failed nodes Salted hashes allow multiple indexes
Relies on well-known relationships, but fast awareness of disruption and rebuilding
Examples: Pastry
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Pastry Approach taken
Only concerned with efficient indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely
No relation to content No included replication or caching
P2P aspects Every node must maintain keys Adaptive to membership changes Leaf nodes are special Client nodes act also as file servers
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Pastry DHT approach
Each node has unique 128-bit nodeId Assigned when node joins Used for routing
Each message has a key NodeIds and keys are in base 2b
b is configuration parameter with typical value 4 (base = 16, hexadecimal digits)
Pastry node routes the message to the node with the closest nodeId to the key
Number of routing steps is O(log N) Pastry takes into account network locality
Each node maintains Routing table is organized into log2b N rows with 2b-1 entry each Neighborhood set M — nodeId’s, IP addresses of M closest nodes,
useful to maintain locality properties Leaf set L — set of L nodes with closest nodeId
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Pastry Routing
NodeId 10233102SMALLER LARGER
10233033 10233021 10233120 10233122
10233001 10233000 10233230 10233232
Leaf set
-0-2212102 -2-2301203 -3-1203203
10-3-23302
1-3-0210221-2-230203
10-0-31203
1-1-301233
102-0-0230
10-1-32102
10233-0-01
1023-2-1211023-1-0001023-0-322
102-2-2302102-1-1302
102331-2-0
10233-2-32
10
23
10
2
3
Routing table
1020023013021022
33213321312032032230120302212102
3130123311301233Neighborhood set
b=2, so nodeIdis base 4 (16 bits)
Contains thenodes that are
numericallyclosest to
local node
Contains thenodes that are
closest tolocal node
according toproximity metric
2b-1 entries per row
lo
g2b N
ro
ws
Entries in the nth rowshare the first n-1 digitswith current nodecommon prefix – next digit – rest
Entries in the mth columnhave m as nth row digit
Entries with no suitablenodeId are left empty
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
1331X1: 1-0-30 | 1-1-23 | 1-2-11 | 1-3-31
1211
X2: 12-0-1 | 12-1-1 | 12-2-3 | 12-3-3 1223
L: 1232 | 1221 | 1300 | 1301
2331
X0: 0-130 | 1-331 | 2-331 | 3-001
source
1221
dest
Pastry Routing1. Search leaf set for exact match2. Search route table for entry with at one
more digit common in the prefix3. Forward message to node with equally
number of digits in prefix, but numerically closer in leaf set
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Pastry Assessment
Scalability, fairness, load balancing Distributed index of arbitrary size Support for physical locality and locality by hash value Stochastically logarithmic search effort Network topology is partially accounted for, given an additional
metric for physical locality Stochastically logarithmic lookup in large systems, variable
lookup costs
Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Not restricted to file location
Failure resilience No single point of failure Several possibilities for backup routes
Examples: Tapestry
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Tapestry
Approach taken Only concerned with self-organizing indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely
No relation to content No included replication or caching
P2P aspects Every node must maintain keys Adaptive to changes in membership and value
change
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Routing and Location Namespace (nodes and objects)
SHA-1 hash: 160 bits length Each object has its own hierarchy rooted at RootID =
hash(ObjectID)
Prefix-routing [JSAC 2004]
Router at hth hop shares prefix of length ≥h digits with destination
local tables at each node (neighbor maps) route digit by digit: 4*** 42** 42A* 42AD neighbor links in levels
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Routing and Location Suffix routing [tech report 2001]
Router at hth hop shares suffix of length ≥h digits with destination
Example: 5324 routes to 0629 via5324 2349 1429 7629 0629
Tapestry routing Cache pointers to all copies Caches are soft-state UDP Heartbeat and TCP timeout to verify route availability Each node has 2 backup neighbors Failing primary neighbors are kept for some time (days) Multiple root nodes possible, identified via hash functions
Search value in a root if its hash is that of the root Choosing a root node
Choose a random address Route towards that address If no route exists, choose deterministically, a surrogate
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Routing and Location Object location
Root responsible for storing object’s location (but not the object) Publish / search both routes incrementally to root
Locates objects Object: key/value pair
E.g. filename/file Automatic replication of keys No automatic replication of values
Values May be replicated May be stored in erasure-coded fragments
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
TapestryInsert( , key K, value V)V
#K#addr 1#addr 2…
(#K,●)
(#K,●)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
TapestryV
(#K,●)
(#K,●)
#K#addr 1#addr 2…
?K
(#K,●)(#K,●)
(#K,●)
(#K,●)
(#K,●)
●
caching
result
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
TapestryV
(#K,●)
(#K,●)
●
(#K,●)(#K,●)
(#K,●)
(#K,●)
(#K,●)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
TapestryV
(#K,●)
(#K,●)
●
(#K,●)(#K,●)
(#K,●)
(#K,●)
(#K,●)
Move( , key K, value V)
V
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Tapestry
(#K,●)
(#K,●)
●
(#K,●)(#K,●)
(#K,●)
(#K,●)
(#K,●)
V
(#K,●)
(#K,●)
●Stays wrongtill timeout
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Tapestry Assessment Scalability, fairness, load balancing
Distributed index(es) of arbitrary size Limited physical locality of key access by caching and nodeId
selection Variable lookup costs Independent of content scalability
Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Not restricted to file location
Failure resilience No single point of failure Several possibilities for backup routes Caching of key resolutions Use of hash values with several salt values
Comparison
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Comparison
Chord Pastry Tapestry
Routing information
Log(#nodes) routing table size
Log(#nodes) x (2b – 1)routing table size
At least log(#nodes) routing table size
Lookup cost Log(#nodes) lookup cost
Approx. log(#nodes) lookup cost
Variable lookup cost
Physical locality
By neighbor list In mobile tapestry
Failure resilience
No resilience in basic version
Additional successor lists provide resilience
No single point of failureSeveral backup route
No single point of failureSeveral backup routeAlternative hierarchies
ApplicationsApplications
Streaming in Peer-to-peer networks
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Peer-to-peer network
Promise
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Promise Video streaming in Peer-to-Peer systems
Video segmentation into many small segments Pull operation Pull from several sources at once
Based on Pastry and CollectCast
CollectCast Adds rate/data assignment Evaluates
Node capabilities Overlay route capabilities
Uses topology inference Detects shared path segments - using ICMP similar to traceroute Tries to avoid shared path segments Labels segments with quality (or goodness)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Promise[Hafeeda et. al. 03]
Receiver
active sender
active sender
standby senderactive sender
Thus, Promise is a multiple sender to one receiver P2P media streaming system which 1) accounts for different capabilities, 2) matches senders to achieve best quality, and 3) dynamically adapts to network fluctuations and peer failure
Each active sender: • receives a control packet specifying which data segments, data rate, etc.,• pushes data to receiver as long as no new control packet is received
standby sender
The receiver: • sends a lookup request using DHT• selects some active senders, control packet• receives data as long as no errors/changes occur • if a change/error is detected, new active senders may be selected
SplitStream
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
SplitStream
Video streaming in Peer-to-Peer systems Uses layered video Uses overlay multicast Push operation Build disjoint overlay multicast trees
Based on Pastry
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
SplitStream
Source: full quality movie
Stripe 2
Stripe 1
[Castro et. al. 03]Each node: • joins as many multicast trees as there are stripes (K)• may specify the number of stripes they are willing to act as router for, i.e., according to the amount of resources available
Each movie is split into K stripes and each stripe is multicasted using a separate three
Thus, SplitStream is a multiple sender to multiple receiver P2P system which distributes the forwarding load while respecting each node’s resource limitations, but some effort is required to build the forest of multicast threes
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Some References1. M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh,
"SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October 2003
2. Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava, "Promise: Peer-to-Peer Media Streaming Using Collectcast", ACM MM’03, Berkeley, CA, November 2003
3. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, ACM SIGCOMM’01
4. Ben Y. Zhao, John Kubiatowicz and Anthony Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing”, UCB Technical Report CSD-01-1141, 1996
5. John Kubiatowicz, “Extracting Guarantees from Chaos”, Comm. ACM, 46(2), February 2003
6. Antony Rowstron and Peter Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Middleware’01, November 2001
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Vikash Agarwal, Reza RejaieComputer and Information Science Department
University of Oregonhttp://mirage.cs.uoregon.edu
January 19, 2005
Adaptive Multi-Source Streaming in Heterogeneous Peer-to-Peer Networks
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Introduction
P2P streaming becomes increasingly popular Participating peers form an overlay to cooperatively
stream content among themselves Overlay-based approach is the only way to efficiently
support multi-party streaming apps without multicast Two components:
Overlay construction Content delivery
Each peer desires to receive max. quality that can be streamed through its access link
Peers have asymmetric & heterogeneous BW connectivity
Each peer should receive content from multiple parent peers => Multi-source streaming. Multi-parent overlay structure rather than tree
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Benefits of Multi-source Streaming
Higher bandwidth to each peer higher delivered quality
Better load balancing among peers Less congestion across the network More robust to dynamics of peer
participation Multi-source streaming introduces new
challenges …
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Multi-source streaming: Challenges
Congestion controlled connections from different parent peers exhibit independent variations in BW different RTT, BW, loss rate
Aggregate bandwidth changes over time Streaming mechanism should be quality adaptive
Static “one-layer-per-sender” approach is inefficient There must be a coordination mechanism among senders
in order to Efficiently utilize aggregate bandwidth Gracefully adapt delivered quality with BW variations
This paper presents a receiver-driven coordination mechanism for multi-source streaming called PALS
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Previous Studies
Congestion control was often ignored Server/content placement for streaming MD content
[Apostolopoulos et al.] Resource management for P2P streaming [Cue et al.]
Multi-sender streaming [Nguyen et al], but they assumed Aggregate BW is more than stream BW
RLM is receiver-driven but .. RLM tightly couples coarse quality adaptation with CC PALS only determines how aggregate BW is used
P2P content dist. mechanism can not accomodate “streaming” apps e.g. BitTorrent, Bullet
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Overall Architecture
Overall architecture for P2P streaming PRO: Bandwidth-aware overlay construction
Identifying good parents in the overlay PALS: Multi-source adaptive streaming
Streaming content from selected parents Distributed multimedia caching
Decoupling overlay construction from delivery provides great deal of flexibility
PALS is a generic multi-source streaming protocol for non-interactive applications
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Assumptions & Goals
Assumptions: All peers/flows are cong.
controlled Content is layered encoded All layers are CBR with the
same cons. rate* All senders have all layers
(relax this later)* Limited window of future
packets are available at each sender Live but non-interactive
* Not requirements
Goals: To fully utilize aggregate
bandwidth to dynamically maximize delivered quality Deliver max no of layers Minimize variations in quality
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
P2P Adaptive Layered Streaming (PALS)
Receiver: periodically requests an ordered list of packets/segments from each sender.
Sender: simply delivers requested packets with the given order at the CC rate
Benefits of ordering the requested list: Provide flexibility for the receiver to closely control
delivered packets Graceful degradation in quality when bandwidth
suddenly drops
Periodic requests => stability & less overhead
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Basic Framework
Internet
Peer 0bw
(t)2
bw (t)1
CCCbuf1
buf2
bw (t)3
3buf
Decoder
Demux
Cbuf0
bw (t)0
BW0BW1
BW2
Peer 1
Peer
2
Receiver passively monitors EWMA BW from each sender
EWMA aggregate BW
Estimate total no of pkts to be delivered during next window (K)
Allocate K pkts among active layers (Quality Adaptation)
•Controlling bw0(t), bw1(t), …,
•Controlling evolution of buf. state.
Assign a subset of pkts to each sender (Packet assignment)
•Allocating each sender’s bw among active layers
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Key Components of PALS
Sliding Window (SW): to keep all senders busy & loosely synchronized with receiver playout time
Quality adaptation (QA): to determine quality of delivered stream, i.e. required packets for all layers during one window
Packet Assignment (PA): to properly distribute required packets among senders
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Sliding Window
Buffering window: range of timestamps for packets that must be requested in one window.
Window is slided forward in a step-like fashion Requested packets per window can be from1) Playing window (loss recovery)2) Buffering window (main group)3) Future windows (buffering)
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Sliding Window (cont’d)
Window size determines the tradeoff between smoothness or signaling overhead & responsiveness Should be a function of RTT since it specifies timescale
of variations in BW Multiple of max smoothed RTT among senders
Receiver might receive duplicates Re-requesting the packet that is in flight! Ratio of duplicates are very low and can be reduced by
increasing window
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Coping with BW variations
Sliding window is insufficient Coping with sudden drop in BW by
Overwriting request at senders Ordering requested packets
Coping with sudden increase in BW by Requesting extra packets
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Quality Adaptation
Determining required packets from future windows
Coarse-grained adaptation Add/drop layer
Fine-grained adaptation Controlling bw0(t), bw1(t), …, Loosely controlling evolution of
receiver buffer state/dist.
What is a proper buffer dist? Buffer distribution determines
what degree of BW variations can be smoothed.
Internet
Peer 0bw
(t)2
bw (t)1
CCCbuf1
buf2
bw (t)3
3buf
Decoder
Demux
Cbuf0
bw (t)0
BW0BW1
BW2
Peer 1
Peer
2
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Buffer Distribution
Impact on delivered quality Conservative buf. distribution
achieves long-term smoothing Aggressive buf. distribution achieves
short-term improvement PALS leverages this tradeoff in a
balanced fashion Window size affects buffering:
Amount of future buffering Slope of buffer distribution
Multiple opportunities to request a packet (see paper) Implicit loss recovery
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Packet Assignment
How to assign an ordered list of selected pkts from diff. layers to individual senders?
Number of assigned pkts to each sender must be proportional to its BW contribution
More important pkts should be delivered Weighted round robin pkt assignment strategy
Extended this strategy to support partially available content at each peer Please see paper for further details
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Performance Evaluation
Using ns simulation to control BW dynamics Focused on three key dynamics in P2P systems: BW
variations, Peer participation, Content availability Senders with heterogeneous RTT & BW Decouple underlying CC mechanism from PALS Performance Metrics: BW Utilization, Delivered Quality Two strawman mechanisms with static layer assignment to
each sender: Single Layer per Sender (SLS): Sender i delivers layer i Multiple Layer per Sender (MLS): Sender i delivers layer j<i
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Necessity of Coordination
SLS & MLS exhibit high variations in quality No explicit loss recovery No coordination Inter-layer dependency
magnifies the problem
PALS effectively utilizes aggregate BW & delivers stable quality in all cases
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Delay-Window Tradeoff
Avg. delivered quality only depends on agg. BW Heterogeneous senders
Higher Delay => smoother quality
Duplicates exponentially decrease with window size
Avg. per-layer buffering linearly increases with Delay
Increasing window leads to even buffer dist.
See paper for more results.
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Conclusion & Future Work
PALS is a receiver-driven coordination mechanism for streaming from multiple cong. controlled senders.
Simulation results are very promising Future work:
Further simulation to examine further details Prototype implementation for real experiments Integration with other components of our architecture
for P2P streaming
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Partially available content
Effect of segment size and redundancy
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems
Packet Dynamics