Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

109
Peer-to-Peer Peer-to-Peer Systems Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Transcript of Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Page 1: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Peer-to-Peer Peer-to-Peer SystemsSystems

27/10 & 10/11 – 2006

INF5071 – Performance in distributed systems:

Page 2: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Client-Server

backbonenetwork

local distribution

network

local distribution

network

local distribution

network

Traditional distributed computing Successful architecture, and will

continue to be so (adding proxy servers) Tremendous engineering necessary to

make server farms scalable and robust

Page 3: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Distribution with proxies

Hierarchical distribution system E.g. proxy caches that

consider popularity

Popular videos replicated and kept close to clients

Unpopular ones close to the root servers

Popular videos are replicated more frequently end-systems

local servers

root servers

regionalservers

completeness of available content

Page 4: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Peer-to-Peer (P2P)

backbonenetwork

local distribution

network

local distribution

network

local distribution

network

Really an old idea - a distributed system architecture No centralized control Nodes are symmetric in function

Typically, many nodes, but unreliable and heterogeneous

Page 5: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Overlay networks

LAN

backbone

network

backbone

networkbackbon

enetwork

LAN

LANLAN

IP routing

IP link

IP path

Overlay node

Overlay link

Page 6: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

P2P Many aspects similar to proxy caches

Nodes act as clients and servers Distributed storage Bring content closer to clients Storage limitation of each node Number of copies often related to content popularity Necessary to make replication and de-replication decisions Redirection

But No distinguished roles No generic hierarchical relationship

At most hierarchy per data item Clients do not know where the content is

May need a discovery protocol All clients may act as roots (origin servers) Members of the P2P network come and go

Page 7: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

P2P Systems Peer-to-peer systems

New considerations for distribution systems

Considered here Scalability, fairness, load balancing Content location Failure resilience Routing

Application layer routing Content routing Request routing

Not considered here Copyright Privacy Trading

Page 8: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Napster

Page 9: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Napster

Program for sharing (music) files over the Internet

Approach taken Central index Distributed storage and download All downloads are shared

P2P aspects Client nodes act also as file servers

Page 10: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Napster

Client connects to Napster with login and password

Transmits current listing of shared files

Napster registers username, maps username to IP address and records song list

central index

join

...

Page 11: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Napster

central index

query

answer

...

Client sends song request to Napster server

Napster checks song database

Returns matched songs with usernames and IP addresses (plus extra stats)

Page 12: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Napster

central index

get

file

...

User selects a song, download request sent straight to user

Machine contacted if available

Page 13: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Napster: Assessment Scalability, fairness, load balancing

Replication to querying nodes Number of copies increases with popularity

Large distributed storage Unavailability of files with low popularity Network topology is not accounted for at all Latency may be increased

Content location Simple, centralized search/location mechanism Can only query by index terms

Failure resilience No dependencies among normal peers Index server as single point of failure

Page 14: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Gnutella

Page 15: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella Program for sharing files over the Internet

Approach taken Purely P2P, centralized nothing Dynamically built overlay network Query for content by overlay broadcast No index maintenance

P2P aspects Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture

Many iterations to fix poor initial design (lack of scalability)

Page 16: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Joining Connect to one

known host and send a broadcast ping

Can be any host, hosts transmitted through word-of-mouth or host-caches

Use overlay broadcast ping through network with TTL of 7

TTL 1TTL 2TTL 3TTL 4

Page 17: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Joining Hosts that are not

overwhelmed respond with a routed pong

Gnutella caches these IP addresses or replyingnodes as neighbors

In the example the grey nodes do not respond within a certain amount of time (they are overloaded)

Page 18: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Query

Query by broadcasting in the overlay

Send query to all overlay neighbors

Overlay neighbors forward query to all their neighbors

Up to 7 layers deep (TTL 7)

qu

ery

query

query

query

query

query

query

query

query

TTL:7

TTL:6

Page 19: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Query Send routed

responses

To the overlay node that was the source of the broadcast query

Querying client receives several responses

User receives a list of files that matched the query and a corresponding IP address

query

responsequeryresponse

query

response

quer

yre

spon

se

Page 20: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Transfer File transfer

Using direct communication

File transfer protocol not part of the Gnutella specification

do

wn

load

req

ues

tre

qu

este

dfi

le

Page 21: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Gnutella: Assessment Scalability, fairness, load balancing

Replication to querying nodes Number of copies increases with popularity

Large distributed storage Unavailability of files with low popularity Bad scalability, uses flooding approach Network topology is not accounted for at all, latency may be

increased

Content location No limits to query formulation Less popular files may be outside TTL

Failure resilience No single point of failure Many known neighbors Assumes quite stable relationships

Page 22: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Freenet

Page 23: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet Program for sharing files over the Internet

Focus on anonymity

Approach taken Purely P2P, centralized nothing Dynamically built overlay network Query for content by hashed query and best-first-search Caching of hash values and content Content forwarding in the overlay

P2P aspects Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture Anonymity

Page 24: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet: Nodes and Data

Nodes Routing tables

Contain IP addresses of other nodes and the hash values they hold (resp. held)

Data is indexed with a hash values “Identifiers” are hashed Identifiers may be keywords, author ids, or the

content itself Secure Hash Algorithm (SHA-1) produces a “one-

way” 160-bit key Content-hash key (CHK) = SHA-1(content)

Typically stores blocks

Page 25: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet: Storing and Retrieving Data

Storing Data Data is moved to a server with arithmetically close keys

1. The key and data are sent to the local node2. The key and data is forwarded to the node with the nearest keyRepeat 2 until maximum number of hops is reached

Retrieving data Best First Search

1. An identifier is hashed into a key2. The key is sent to the local node3. If data is not in local store, the request is forwarded to the best

neighborRepeat 3 with next best neighbor until data found, or request times

out4. If data is found, or hop-count reaches zero, return the data or error

along the chain of nodes (if data found, intermediary nodes create entries in their routing tables)

Page 26: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet: Best First Search

Heuristics for Selecting Direction>RES: Returned most results<TIME: Shortest satisfaction time<HOPS: Min hops for results>MSG: Sent us most messages (all types)<QLEN: Shortest queue<LAT: Shortest latency>DEG: Highest degree

query?...

Page 27: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet: Routing Algorithm

Page 28: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Freenet: Assessment Scalability, fairness, load balancing

Caching in the overlay network Access latency decreases with popularity

Large distributed storage Fast removal of files with low popularity

A lot of storage wasted on highly popular files Network topology is not accounted for

Content location Search by hash key: limited ways to formulate queries Content placement changes to fit search pattern Less popular files may be outside TTL

Failure resilience No single point of failure

Page 29: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: FastTrack, Morpheus, OpenFT

Page 30: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

FastTrack, Morpheus, OpenFT

Peer-to-peer file sharing protocol

Three different nodes

USER Normal nodes

SEARCH Keep an index of “their” normal nodes Answer search requests

INDEX Keep an index of search nodes Redistribute search requests

Page 31: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

FastTrack, Morpheus, OpenFT

INDEX

SEARCH

USER

Page 32: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

FastTrack, Morpheus, OpenFT

INDEX

SEARCH

USER

?

!!

Page 33: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

FastTrack, Morpheus, OpenFT: Assessment

Scalability, fairness, load balancing Large distributed storage Avoids broadcasts Load concentrated on super nodes (index and search) Network topology is partially accounted for Efficient structure development

Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Can only query by index terms

Failure resilience No single point of failure but overlay networks of index servers (and

search servers) reduces resilience Relies on very stable relationship / Content is registered at search nodes Relies on a partially static infrastructure

Page 34: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: BitTorrent

Page 35: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Distributed download system

Content is distributed in segments

Tracker One central download server per content Approach to fairness (tit-for-tat) per content

No approach for finding the tracker

No content transfer protocol included

Page 36: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

Segment download operation

Tracker tells peer source and number of segment to get

Peer retrieves content in pull mode

Peer reports availability of new segment to tracker

Page 37: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

Rarest first strategy

No second input stream:

not contributed enough

Page 38: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

All nodes: max 2 concurrent streams in and outNo second input stream:

not contributed enough

Page 39: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

Page 40: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

Page 41: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent

Tracker

Page 42: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

BitTorrent Assessment Scalability, fairness, load balancing

Large distributed storage Avoids broadcasts Transfer content segments rather than complete content Does not rely on clients staying online after download

completion Contributors are allowed to download more

Content location Central server approach

Failure resilience Tracker is single point of failure Content holders can lie

Page 43: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Comparison

Napster Gnutella FreeNet FastTrack BitTorrent

ScalabilityLimited by flooding

Uses caching

Separate overlays per

file

Routing informati

onOne central

serverNeighbour list Index server One tracker

per file

Lookup cost O(1) O(log(#nodes

))O(#nodes) O(1) O(#blocks)

Physical locality

By search server

assignment

Page 44: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Comparison

Napster Gnutella FreeNet FastTrack BitTorrent

Load balancing Many replicas of popular

content

Content placement changes to fit search

Load concentrate

d on supernodes

Rarest first copying

Content location All files

reachableUnpopular files may be

outside TTL

All files reachableSearch by

hash

External issue

Uses index server

Search by index term

Uses flooding

Search by hash

Failure resilience

Index server as single point of failure

No single point of failureOverlay

network of index

servers

Tracker as single point

of failure

Page 45: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Peer-to-Peer Peer-to-Peer SystemsSystems

Distributed directories

Page 46: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Chord

Page 47: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Chord

Approach taken Only concerned with efficient indexing Distributed index - decentralized lookup service Inspired by consistent hashing: SHA-1 hash Content handling is an external problem entirely

No relation to content No included replication or caching

P2P aspects Every node must maintain keys Adaptive to membership changes Client nodes act also as file servers

Page 48: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

keyhash

function

hash table

pos

0

1

2

3

..

..

..

N

y z

Hashbucket

lookup(key) dataInsert(key, data)

Lookup Based on Hash Tables

Page 49: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Nodes are the hash buckets Key identifies data uniquely DHT balances keys and data across nodes

Distributed Hash Tables (DHTs)

Distributed application

Distributed hash tables

Lookup (key) data

node ….

Insert(key, data)

node node node

Define a useful key nearness metric Keep the hop count small Keep the routing tables “right size” Stay robust despite rapid changes in membership

Page 50: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Chord IDs & Consistent Hashing m bit identifier space for both keys and nodes

Key identifier = SHA-1(key)

Node identifier = SHA-1(IP address)

Both are uniformly distributed

Identifiers ordered in a circle modulo 2m

A key is mapped to the first node whose id is equal to or follows the key id

Key=“LetItBe” ID=54SHA-1

IP=“198.10.10.1” ID=123SHA-1

Page 51: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Routing: Everyone-Knows-Everyone

Every node knows of every other node - requires global information

Routing tables are large – N

Hash(“LetItBe”) = K54

Where is “LetItBe”? “N56 has K60”

Requires O(1) hops

Page 52: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Hash(“LetItBe”) = K54

Where is “LetItBe”?

Routing: All Know Their Successor Every node only knows its successor in the ring

Small routing table – 1

Requires O(N) hops

“N56 has K60”

Page 53: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Routing: “Finger Tables” Every node knows m other nodes in the ring Increase distance exponentially Finger i points to successor of n+2i

Page 54: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Joining the Ring Three step process:

Initialize all fingers of new node - by asking another node for help Update fingers of existing nodes Transfer keys from successor to new node

Page 55: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Handling Failures Failure of nodes might cause incorrect lookup

N80 doesn’t know correct successor, so lookup fails One approach: successor lists

Each node knows r immediate successors After failure find first known live successor Increased routing table size

N120

N113

N102

N80

N85

N10

Lookup(90)

Page 56: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Chord Assessment Scalability, fairness, load balancing

Large distributed index Logarithmic search effort Network topology is not accounted for Routing tables contain log(#nodes) Quick lookup in large systems, low variation in lookup costs

Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Log(#nodes) lookup steps Not restricted to file location

Failure resilience No single point of failure Not in basic approach

Successor lists allow use of neighbors to failed nodes Salted hashes allow multiple indexes

Relies on well-known relationships, but fast awareness of disruption and rebuilding

Page 57: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Pastry

Page 58: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Pastry Approach taken

Only concerned with efficient indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely

No relation to content No included replication or caching

P2P aspects Every node must maintain keys Adaptive to membership changes Leaf nodes are special Client nodes act also as file servers

Page 59: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Pastry DHT approach

Each node has unique 128-bit nodeId Assigned when node joins Used for routing

Each message has a key NodeIds and keys are in base 2b

b is configuration parameter with typical value 4 (base = 16, hexadecimal digits)

Pastry node routes the message to the node with the closest nodeId to the key

Number of routing steps is O(log N) Pastry takes into account network locality

Each node maintains Routing table is organized into log2b N rows with 2b-1 entry each Neighborhood set M — nodeId’s, IP addresses of M closest nodes,

useful to maintain locality properties Leaf set L — set of L nodes with closest nodeId

Page 60: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Pastry Routing

NodeId 10233102SMALLER LARGER

10233033 10233021 10233120 10233122

10233001 10233000 10233230 10233232

Leaf set

-0-2212102 -2-2301203 -3-1203203

10-3-23302

1-3-0210221-2-230203

10-0-31203

1-1-301233

102-0-0230

10-1-32102

10233-0-01

1023-2-1211023-1-0001023-0-322

102-2-2302102-1-1302

102331-2-0

10233-2-32

10

23

10

2

3

Routing table

1020023013021022

33213321312032032230120302212102

3130123311301233Neighborhood set

b=2, so nodeIdis base 4 (16 bits)

Contains thenodes that are

numericallyclosest to

local node

Contains thenodes that are

closest tolocal node

according toproximity metric

2b-1 entries per row

lo

g2b N

ro

ws

Entries in the nth rowshare the first n-1 digitswith current nodecommon prefix – next digit – rest

Entries in the mth columnhave m as nth row digit

Entries with no suitablenodeId are left empty

Page 61: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

1331X1: 1-0-30 | 1-1-23 | 1-2-11 | 1-3-31

1211

X2: 12-0-1 | 12-1-1 | 12-2-3 | 12-3-3 1223

L: 1232 | 1221 | 1300 | 1301

2331

X0: 0-130 | 1-331 | 2-331 | 3-001

source

1221

dest

Pastry Routing1. Search leaf set for exact match2. Search route table for entry with at one

more digit common in the prefix3. Forward message to node with equally

number of digits in prefix, but numerically closer in leaf set

Page 62: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Pastry Assessment

Scalability, fairness, load balancing Distributed index of arbitrary size Support for physical locality and locality by hash value Stochastically logarithmic search effort Network topology is partially accounted for, given an additional

metric for physical locality Stochastically logarithmic lookup in large systems, variable

lookup costs

Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Not restricted to file location

Failure resilience No single point of failure Several possibilities for backup routes

Page 63: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Examples: Tapestry

Page 64: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Tapestry

Approach taken Only concerned with self-organizing indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely

No relation to content No included replication or caching

P2P aspects Every node must maintain keys Adaptive to changes in membership and value

change

Page 65: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Routing and Location Namespace (nodes and objects)

SHA-1 hash: 160 bits length Each object has its own hierarchy rooted at RootID =

hash(ObjectID)

Prefix-routing [JSAC 2004]

Router at hth hop shares prefix of length ≥h digits with destination

local tables at each node (neighbor maps) route digit by digit: 4*** 42** 42A* 42AD neighbor links in levels

Page 66: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Routing and Location Suffix routing [tech report 2001]

Router at hth hop shares suffix of length ≥h digits with destination

Example: 5324 routes to 0629 via5324 2349 1429 7629 0629

Tapestry routing Cache pointers to all copies Caches are soft-state UDP Heartbeat and TCP timeout to verify route availability Each node has 2 backup neighbors Failing primary neighbors are kept for some time (days) Multiple root nodes possible, identified via hash functions

Search value in a root if its hash is that of the root Choosing a root node

Choose a random address Route towards that address If no route exists, choose deterministically, a surrogate

Page 67: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Routing and Location Object location

Root responsible for storing object’s location (but not the object) Publish / search both routes incrementally to root

Locates objects Object: key/value pair

E.g. filename/file Automatic replication of keys No automatic replication of values

Values May be replicated May be stored in erasure-coded fragments

Page 68: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

TapestryInsert( , key K, value V)V

#K#addr 1#addr 2…

(#K,●)

(#K,●)

Page 69: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

TapestryV

(#K,●)

(#K,●)

#K#addr 1#addr 2…

?K

(#K,●)(#K,●)

(#K,●)

(#K,●)

(#K,●)

caching

result

Page 70: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

TapestryV

(#K,●)

(#K,●)

(#K,●)(#K,●)

(#K,●)

(#K,●)

(#K,●)

Page 71: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

TapestryV

(#K,●)

(#K,●)

(#K,●)(#K,●)

(#K,●)

(#K,●)

(#K,●)

Move( , key K, value V)

V

Page 72: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Tapestry

(#K,●)

(#K,●)

(#K,●)(#K,●)

(#K,●)

(#K,●)

(#K,●)

V

(#K,●)

(#K,●)

●Stays wrongtill timeout

Page 73: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Tapestry Assessment Scalability, fairness, load balancing

Distributed index(es) of arbitrary size Limited physical locality of key access by caching and nodeId

selection Variable lookup costs Independent of content scalability

Content location Search by hash key: limited ways to formulate queries All indexed files are reachable Not restricted to file location

Failure resilience No single point of failure Several possibilities for backup routes Caching of key resolutions Use of hash values with several salt values

Page 74: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Comparison

Page 75: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Comparison

Chord Pastry Tapestry

Routing information

Log(#nodes) routing table size

Log(#nodes) x (2b – 1)routing table size

At least log(#nodes) routing table size

Lookup cost Log(#nodes) lookup cost

Approx. log(#nodes) lookup cost

Variable lookup cost

Physical locality

By neighbor list In mobile tapestry

Failure resilience

No resilience in basic version

Additional successor lists provide resilience

No single point of failureSeveral backup route

No single point of failureSeveral backup routeAlternative hierarchies

Page 76: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

ApplicationsApplications

Page 77: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Streaming in Peer-to-peer networks

Page 78: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Peer-to-peer network

Page 79: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Promise

Page 80: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Promise Video streaming in Peer-to-Peer systems

Video segmentation into many small segments Pull operation Pull from several sources at once

Based on Pastry and CollectCast

CollectCast Adds rate/data assignment Evaluates

Node capabilities Overlay route capabilities

Uses topology inference Detects shared path segments - using ICMP similar to traceroute Tries to avoid shared path segments Labels segments with quality (or goodness)

Page 81: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Promise[Hafeeda et. al. 03]

Receiver

active sender

active sender

standby senderactive sender

Thus, Promise is a multiple sender to one receiver P2P media streaming system which 1) accounts for different capabilities, 2) matches senders to achieve best quality, and 3) dynamically adapts to network fluctuations and peer failure

Each active sender: • receives a control packet specifying which data segments, data rate, etc.,• pushes data to receiver as long as no new control packet is received

standby sender

The receiver: • sends a lookup request using DHT• selects some active senders, control packet• receives data as long as no errors/changes occur • if a change/error is detected, new active senders may be selected

Page 82: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

SplitStream

Page 83: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

SplitStream

Video streaming in Peer-to-Peer systems Uses layered video Uses overlay multicast Push operation Build disjoint overlay multicast trees

Based on Pastry

Page 84: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

SplitStream

Source: full quality movie

Stripe 2

Stripe 1

[Castro et. al. 03]Each node: • joins as many multicast trees as there are stripes (K)• may specify the number of stripes they are willing to act as router for, i.e., according to the amount of resources available

Each movie is split into K stripes and each stripe is multicasted using a separate three

Thus, SplitStream is a multiple sender to multiple receiver P2P system which distributes the forwarding load while respecting each node’s resource limitations, but some effort is required to build the forest of multicast threes

Page 85: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Some References1. M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh,

"SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October 2003

2. Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava, "Promise: Peer-to-Peer Media Streaming Using Collectcast", ACM MM’03, Berkeley, CA, November 2003

3. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, ACM SIGCOMM’01

4. Ben Y. Zhao, John Kubiatowicz and Anthony Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing”, UCB Technical Report CSD-01-1141, 1996

5. John Kubiatowicz, “Extracting Guarantees from Chaos”, Comm. ACM, 46(2), February 2003

6. Antony Rowstron and Peter Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Middleware’01, November 2001

Page 86: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Vikash Agarwal, Reza RejaieComputer and Information Science Department

University of Oregonhttp://mirage.cs.uoregon.edu

January 19, 2005

Adaptive Multi-Source Streaming in Heterogeneous Peer-to-Peer Networks

Page 87: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Introduction

P2P streaming becomes increasingly popular Participating peers form an overlay to cooperatively

stream content among themselves Overlay-based approach is the only way to efficiently

support multi-party streaming apps without multicast Two components:

Overlay construction Content delivery

Each peer desires to receive max. quality that can be streamed through its access link

Peers have asymmetric & heterogeneous BW connectivity

Each peer should receive content from multiple parent peers => Multi-source streaming. Multi-parent overlay structure rather than tree

Page 88: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Benefits of Multi-source Streaming

Higher bandwidth to each peer higher delivered quality

Better load balancing among peers Less congestion across the network More robust to dynamics of peer

participation Multi-source streaming introduces new

challenges …

Page 89: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Multi-source streaming: Challenges

Congestion controlled connections from different parent peers exhibit independent variations in BW different RTT, BW, loss rate

Aggregate bandwidth changes over time Streaming mechanism should be quality adaptive

Static “one-layer-per-sender” approach is inefficient There must be a coordination mechanism among senders

in order to Efficiently utilize aggregate bandwidth Gracefully adapt delivered quality with BW variations

This paper presents a receiver-driven coordination mechanism for multi-source streaming called PALS

Page 90: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Previous Studies

Congestion control was often ignored Server/content placement for streaming MD content

[Apostolopoulos et al.] Resource management for P2P streaming [Cue et al.]

Multi-sender streaming [Nguyen et al], but they assumed Aggregate BW is more than stream BW

RLM is receiver-driven but .. RLM tightly couples coarse quality adaptation with CC PALS only determines how aggregate BW is used

P2P content dist. mechanism can not accomodate “streaming” apps e.g. BitTorrent, Bullet

Page 91: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Overall Architecture

Overall architecture for P2P streaming PRO: Bandwidth-aware overlay construction

Identifying good parents in the overlay PALS: Multi-source adaptive streaming

Streaming content from selected parents Distributed multimedia caching

Decoupling overlay construction from delivery provides great deal of flexibility

PALS is a generic multi-source streaming protocol for non-interactive applications

Page 92: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Assumptions & Goals

Assumptions: All peers/flows are cong.

controlled Content is layered encoded All layers are CBR with the

same cons. rate* All senders have all layers

(relax this later)* Limited window of future

packets are available at each sender Live but non-interactive

* Not requirements

Goals: To fully utilize aggregate

bandwidth to dynamically maximize delivered quality Deliver max no of layers Minimize variations in quality

Page 93: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

P2P Adaptive Layered Streaming (PALS)

Receiver: periodically requests an ordered list of packets/segments from each sender.

Sender: simply delivers requested packets with the given order at the CC rate

Benefits of ordering the requested list: Provide flexibility for the receiver to closely control

delivered packets Graceful degradation in quality when bandwidth

suddenly drops

Periodic requests => stability & less overhead

Page 94: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Basic Framework

Internet

Peer 0bw

(t)2

bw (t)1

CCCbuf1

buf2

bw (t)3

3buf

Decoder

Demux

Cbuf0

bw (t)0

BW0BW1

BW2

Peer 1

Peer

2

Receiver passively monitors EWMA BW from each sender

EWMA aggregate BW

Estimate total no of pkts to be delivered during next window (K)

Allocate K pkts among active layers (Quality Adaptation)

•Controlling bw0(t), bw1(t), …,

•Controlling evolution of buf. state.

Assign a subset of pkts to each sender (Packet assignment)

•Allocating each sender’s bw among active layers

Page 95: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Key Components of PALS

Sliding Window (SW): to keep all senders busy & loosely synchronized with receiver playout time

Quality adaptation (QA): to determine quality of delivered stream, i.e. required packets for all layers during one window

Packet Assignment (PA): to properly distribute required packets among senders

Page 96: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Sliding Window

Buffering window: range of timestamps for packets that must be requested in one window.

Window is slided forward in a step-like fashion Requested packets per window can be from1) Playing window (loss recovery)2) Buffering window (main group)3) Future windows (buffering)

Page 97: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Sliding Window (cont’d)

Window size determines the tradeoff between smoothness or signaling overhead & responsiveness Should be a function of RTT since it specifies timescale

of variations in BW Multiple of max smoothed RTT among senders

Receiver might receive duplicates Re-requesting the packet that is in flight! Ratio of duplicates are very low and can be reduced by

increasing window

Page 98: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Coping with BW variations

Sliding window is insufficient Coping with sudden drop in BW by

Overwriting request at senders Ordering requested packets

Coping with sudden increase in BW by Requesting extra packets

Page 99: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Quality Adaptation

Determining required packets from future windows

Coarse-grained adaptation Add/drop layer

Fine-grained adaptation Controlling bw0(t), bw1(t), …, Loosely controlling evolution of

receiver buffer state/dist.

What is a proper buffer dist? Buffer distribution determines

what degree of BW variations can be smoothed.

Internet

Peer 0bw

(t)2

bw (t)1

CCCbuf1

buf2

bw (t)3

3buf

Decoder

Demux

Cbuf0

bw (t)0

BW0BW1

BW2

Peer 1

Peer

2

Page 100: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Buffer Distribution

Impact on delivered quality Conservative buf. distribution

achieves long-term smoothing Aggressive buf. distribution achieves

short-term improvement PALS leverages this tradeoff in a

balanced fashion Window size affects buffering:

Amount of future buffering Slope of buffer distribution

Multiple opportunities to request a packet (see paper) Implicit loss recovery

Page 101: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Packet Assignment

How to assign an ordered list of selected pkts from diff. layers to individual senders?

Number of assigned pkts to each sender must be proportional to its BW contribution

More important pkts should be delivered Weighted round robin pkt assignment strategy

Extended this strategy to support partially available content at each peer Please see paper for further details

Page 102: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Performance Evaluation

Using ns simulation to control BW dynamics Focused on three key dynamics in P2P systems: BW

variations, Peer participation, Content availability Senders with heterogeneous RTT & BW Decouple underlying CC mechanism from PALS Performance Metrics: BW Utilization, Delivered Quality Two strawman mechanisms with static layer assignment to

each sender: Single Layer per Sender (SLS): Sender i delivers layer i Multiple Layer per Sender (MLS): Sender i delivers layer j<i

Page 103: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Necessity of Coordination

SLS & MLS exhibit high variations in quality No explicit loss recovery No coordination Inter-layer dependency

magnifies the problem

PALS effectively utilizes aggregate BW & delivers stable quality in all cases

Page 104: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Delay-Window Tradeoff

Avg. delivered quality only depends on agg. BW Heterogeneous senders

Higher Delay => smoother quality

Duplicates exponentially decrease with window size

Avg. per-layer buffering linearly increases with Delay

Increasing window leads to even buffer dist.

See paper for more results.

Page 105: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Conclusion & Future Work

PALS is a receiver-driven coordination mechanism for streaming from multiple cong. controlled senders.

Simulation results are very promising Future work:

Further simulation to examine further details Prototype implementation for real experiments Integration with other components of our architecture

for P2P streaming

Page 106: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Partially available content

Effect of segment size and redundancy

Page 107: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Page 108: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Page 109: Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål HalvorsenINF5071 – performance in distributed systems

Packet Dynamics