Fault-tolerant Routing in Peer-to-Peer Systems

25
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002

description

Fault-tolerant Routing in Peer-to-Peer Systems. James Aspnes Zoë Diamadi Gauri Shah. Yale University PODC 2002. Peers. Resources. P2P network. Key. Bunch of peers . Store resources identified by keys. Peers subject to crash failures. Goal: locate resources ‘’efficiently’’. - PowerPoint PPT Presentation

Transcript of Fault-tolerant Routing in Peer-to-Peer Systems

Page 1: Fault-tolerant Routing in Peer-to-Peer Systems

Fault-tolerant Routing in Peer-to-Peer Systems

James AspnesZoë DiamadiGauri Shah

Yale UniversityPODC 2002

Page 2: Fault-tolerant Routing in Peer-to-Peer Systems

P2P network

•Bunch of peers. •Store resources identified by keys.•Peers subject to crash failures.•Goal: locate resources ‘’efficiently’’.

Resources

Peers

Key

Page 3: Fault-tolerant Routing in Peer-to-Peer Systems

Properties of ideal network

•Data availability•Decentralization•Fault-tolerance•Scalability•Load balancing

•Maintaining network•Dynamic node addition/deletion•Self-stabilization

•Efficient searching•Incorporating geography•Incorporating locality

Page 4: Fault-tolerant Routing in Peer-to-Peer Systems

Early P2P systems

Napster

?

x

x? x

Central server bottleneck

Gnutella

Inefficient flooding

Page 5: Fault-tolerant Routing in Peer-to-Peer Systems

Tapestry [JKZ’01]Uses Plaxton’s Algorithm:

Correct one digit at a time to reach target. Pastry [DR’01] is also similar.

427

327

768

368

135 360

365

123

Node xyz links to *XX, x*X and xy* [* = all digits, X = any digit]

Page 6: Fault-tolerant Routing in Peer-to-Peer Systems

CAN [RFHKS’01]

3 5

7

82

(0,0) (1,0)

(0,1) (1,1)

Partition d-dimensional co-ordinate space into zones.

Nodes own zones and keys hashed to them.Greedy routing: forward to neighbor closest to target.

d=2

zone

Page 7: Fault-tolerant Routing in Peer-to-Peer Systems

Chord [SMKKB‘01]Nodes and resources mapped to identifier circle.Routing table: successor nodes at distances .

0

1

2

3

4

7

6

5

Greedy routing: forward to node in routing table closest to target

660

003

036

successorsidentifier circle (n=8)

m2i2

Page 8: Fault-tolerant Routing in Peer-to-Peer Systems

Common underlying structure

•Underlying metric space.•Nodes embedded in metric space.•Location determined by key.•Hashing to balance load.•Greedy routing.•O(log n) space at each node. •O(log n) routing time.

Page 9: Fault-tolerant Routing in Peer-to-Peer Systems

Unifying approach

v2

HASH

v1

v4

v3 v1 v2 v3 v4

KeysNodes

Actual Route

Physical Link Virtual

Link

Virtual Route

PHYSICAL NETWORK VIRTUAL OVERLAY NETWORK

Page 10: Fault-tolerant Routing in Peer-to-Peer Systems

Link Distribution

Each node independently selects k long-hoplinks as per some distribution .Δ

Links chosen as per ΔNodes

x-d1

x

x-d2

Page 11: Fault-tolerant Routing in Peer-to-Peer Systems

Abstract model

Simple metric space: 1D line.

Hash(key) = Metric space location.

Short-hop links: immediate neighbors.Long-hops links:inverse-distance distribution.

Pr[edge(u,v)] = 1/d(u,v) /

Greedy Routing: forward message to neighbor closest to target in metric space.

uv'1/d(u,v’)

Page 12: Fault-tolerant Routing in Peer-to-Peer Systems

What do we care about?

•Do we get similar upper bounds on routing time with failures?

•Is it possible to design a link distribution that beats the O(log2n) bound for routing given by 1/d distribution?

•Can we dynamically construct such a network?

Page 13: Fault-tolerant Routing in Peer-to-Peer Systems

Greedy routing with failures

Analyze message delivery in phases [Kleinberg ‘99].

Target t

Phase 0

Phase 1

Phase 2

Message at node n in phase i: 2i d(n, t) < 2i+1

At most (log n + 1) such phases.

Page 14: Fault-tolerant Routing in Peer-to-Peer Systems

[1..log n] long-hop linksSuppose each node has k long-hop links.Average time spent in each phase: ((log n)/k).

With O(log n) such phases:Total time: O((log2n)/k).

With failures:

Suppose each node/link fails with prob (1-p).Average time spent in each phase: ((log n)/pk).

Total time: O((log2n)/pk)

Page 15: Fault-tolerant Routing in Peer-to-Peer Systems

Simulation resultsFailed Searches

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Probability of node failure

% o

f fa

iled

sea

rches

Backtrack

Failed

Random Reroute

n=131072 nodeslog n=17 links

What happens with > log n links?

Page 16: Fault-tolerant Routing in Peer-to-Peer Systems

What do we care about?

•Do we get similar upper bounds on routing time with failures?

•Is it possible to design a link distribution that beats the O(log2n) bound for routing given by 1/d distribution? Lower bound on routing time as a function of number of links per node.

•Can we dynamically construct such a network?

Page 17: Fault-tolerant Routing in Peer-to-Peer Systems

Intuition for lower bound [KUW’88]

Time needed for a non-increasing real-valuedMarkov chain X0, X1, X2…. to drop to 1 bounded by:

0

1

10

X

dz/μ z) Τ(X

where = E[Xt –Xt+1: Xt = z] is a non-decreasing function of z.

Page 18: Fault-tolerant Routing in Peer-to-Peer Systems

Upper bound on time

SFO NYC

z

zμ gives lower bound on average crossing speed.

zx μμ μ( is non-decreasing so )

x

gives upper bound on time.

Starting from x, average speed at z = . xμ

dzzμ/1

Page 19: Fault-tolerant Routing in Peer-to-Peer Systems

Lower bound on time

SFO NYC

zx

zμ gives upper bound on average crossing speed.

mz= sup

This may give too large an estimate, socondition against high bursts of speed.

zx

gives lower bound on time.dzzm/1

Page 20: Fault-tolerant Routing in Peer-to-Peer Systems

Tool for lower boundNon-increasing Markov chain: X0, X1, X2 ….., state space S.

E[Time to reach 0] T(X0)/[ T(X0) + (1- )] εε

Pr[Xt – Xt+1 U : Xt = x] εFew long jumps

x

1

dz1/m z

T(x)Time from x[no long jumps]

Upper bound on speed [no long jumps]

E[Xt – Xt+1 : Xt=x, Xt – Xt+1 < U]xμ

mz = sup { : x S, x [z, z+U) }xμ ε ε

Page 21: Fault-tolerant Routing in Peer-to-Peer Systems

Applying tool to routingCannot bound progress of single node with an arbitrary distribution!

So use an aggregate chain St of nodes forcollective behavior of nodes in some range.

Track ln(|St|) for recurrence relation.

Std3

d2

d1

Nodes

Links

0

St+1

0

Page 22: Fault-tolerant Routing in Peer-to-Peer Systems

Lower boundsRandom graph G. Node x has k independent links on average. x links to (x-1) and (x+1). Expected time to reach 0 from a Point chosen uniformly from 1..n:

(ln2n) worse that O(ln n) for a tree: cost ofassuming symmetry between nodes.

Ω

n) ln ln n/kΩ(ln21-sided routing:

n) ln ln n/kΩ(ln 222-sided routing: *

link ignored

s

s

link ignored

* Probability of choosing links symmetric about 0 and unimodal.

Page 23: Fault-tolerant Routing in Peer-to-Peer Systems

What do we care about?

•Do we get similar upper bounds on routing time with failures?

•Is it possible to design a link distribution that beats the O(log2n) bound for routing given by 1/d distribution?

•Can we dynamically construct such a network?

Page 24: Fault-tolerant Routing in Peer-to-Peer Systems

Heuristic for construction

New node chooses neighbors using inversedistance distribution. Links to live nodes closest to chosen ones.

Selects older nodes to point to it.

new node

ideal link

adjusted link

absent node

new link

initial link

older node

y x

Page 25: Fault-tolerant Routing in Peer-to-Peer Systems

Open problems

•Design a self-stabilization mechanism.[Aspnes, Shah ’02] submitted to SODA

•Does lower bound generalize to multidimensional metric spaces?

•Analyze security properties such as anonymity and byzantine failures.

?

•Does backtracking give provably good routing bound?