A Survey of Peer-to-Peer Content Distribution Technologies

49
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi

description

A Survey of Peer-to-Peer Content Distribution Technologies. Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi. Outline. Overview of P2P P2P Motivation P2P Characteristics & Benefits - PowerPoint PPT Presentation

Transcript of A Survey of Peer-to-Peer Content Distribution Technologies

Page 1: A Survey of Peer-to-Peer Content Distribution Technologies

A Survey of Peer-to-Peer Content Distribution Technologies

Stephanos Androutsellis-Theotokis and Diomidis SpinellisACM Computing Surveys, December 2004

Presenter: Seung-hwan Baek

Ja-eun Choi

Page 2: A Survey of Peer-to-Peer Content Distribution Technologies

Outline• Overview of P2P

– P2P Motivation– P2P Characteristics & Benefits– P2P Application Types

• P2P Classification– Unstructured: Gnutella, Kazaa, Napster– Structured: Freenet, Chord, CAN, Tapestry

• Other Aspects• Conclusions

2/50

Page 3: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Motivation

Client/Server Architecture:• Well known, powerful, reliable server is a data source• Clients request data from server• Very successful model WWW (HTTP), FTP, Web services, etc.

3/50

Page 4: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Motivation (Cont’d)

Client/Server Limitation:• Scalability is hard to achieve• Presents a single point of failure• Requires administration• Unused resources at the network edge

P2P systems try to address these limitations

4/50

Page 5: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Characteristics

P2P Computing:• P2P computing is the sharing of computer resources and ser-

vices by direct exchange between systems.• These resources and services include the exchange of infor-

mation, processing cycles, cache storage, and disk storage for files.

• P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the ‘benefit’ of all.

5/50

Page 6: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Characteristics (Cont’d)

P2P Characteristics:• All nodes are both clients and servers

– Provide and consume data– Any node can initiate a connection

• No centralized data source Nodes collaborate directly with each other (not through well-known servers)

• Network is dynamic Nodes enter and leave the network “frequently”

6/50

Page 7: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Benefits• Ease of administration

– Nodes self-organize adaptively– No need to deploy servers to satisfy demand (c.f. scalability)– Built-in fault tolerance, replication, and load balancing

• Scalability– Consumers of resources also donate resources– Aggregate resources grow naturally with utilization

• Reliability– Geographic distribution– No single point of failure

7/50

Page 8: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Application Types• Direct real-time communication: instant messaging• Combine processing power of multiple distributed machines

to perform complex computations: analysis of SETI data, prime computation

• Distributed database systems• Store and distribute digital content: mp3 file sharing

(Content Distribution)

8/50

Page 9: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Classification

Architecture Types:• Unstructured• Structured• Loosely structured

Here,By structure, we refer to whether overlay network is created non-deterministically or whether it’s created based on a specific rules

9/50

Page 10: A Survey of Peer-to-Peer Content Distribution Technologies

P2P Classification (Cont’d)

10/50

Unstructured Loosely Structured

Highly Structured

Hybrid Napster, IM

Partial Kazaa, Gia

None Gnutella Freenet Chord, CANCen

traliz

atio

n

Data organization

Page 11: A Survey of Peer-to-Peer Content Distribution Technologies

Unstructured Architectures• Placement of content is unrelated to overlay topology• Search mechanism is required.• Appropriate for case of highly-transient node population

Degrees of centralization:• Purely Decentralized• Partially Centralized• Hybrid Decentralized

11/50

Page 12: A Survey of Peer-to-Peer Content Distribution Technologies

Purely Decentralized

12/50

• Purely Decentralized– No central coordination– Users (servents) connect to

each other directly.• Gnutella architecture

– Query: Flooding• Send messages to all neighbors

– Response: Route back• Scalability Issues

– With TTL, virtual horizon– Without TTL, unlimited flooding

• E.g., Gnutella, FreeHaven

registrationregistration

registrationregistration

registration

query

query

query

query query

replyreply

request download

query

Page 13: A Survey of Peer-to-Peer Content Distribution Technologies

Partially Centralized

13/50

• Partially Centralized– Supernodes

• Indexing & caching files of small subpart of the peer network

• Peers are automatically elected to become supernodes.

• Advantages– Reduced discovery time– Normal nodes will be lightly

loaded.• E.g., Kazaa, Edutella, Gnutella

(later version)

registration

queryreply

query reply

request

download

Page 14: A Survey of Peer-to-Peer Content Distribution Technologies

Hybrid Decentralized

14/50

• Hybrid Decentralized– Central directory server

• User connection info.• File & metadata info.

• Advantages– Simple to implement– Locate files quickly and effi-

ciently• Disadvantages

– Vulnerable to technical failure– Inherently unscalable

• E.g., Napster, Publius

resigtration

query

reply

request

download

Page 15: A Survey of Peer-to-Peer Content Distribution Technologies

Outline• Overview of P2P

– P2P Motivation– P2P Characteristics & Benefits– P2P Application Types

• P2P Classification– Unstructured: Gnutella, Kazaa, Napster– Structured: Freenet, Chord, CAN, Tapestry

• Other Aspects• Conclusions

15/50

Page 16: A Survey of Peer-to-Peer Content Distribution Technologies

Structured Architectures• Features

– Mapping of content and location– Scalable solution for exact-match queries

• Examples– Freenet– Chord– CAN– Tapestry

Page 17: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet• Loosely Structured System

– Chain mode propagation• Each node

– Local data store– Dynamic routing table

• ( node address, file key )

• Each file– Unique binary key

Page 18: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Messages

– Node ID, Timeout, Src ID, Dst ID• Message types

– Data insert : key, data– Data request : key– Data reply : file– Data filed : failure location, reason

Page 19: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Data Insert

– Calculates a binary key– Sends a data insert message to itself

• Receiving a Data Insert message– If not taken

• Store the data• Forwards to the closest key’s owner

– If taken• Returns the preexisting file

Page 20: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Data Request

– Chain mode propagation• Receiving a Data Request

– If locally stored• The search stops and the data is forwarded back

– If not• Forwards to the closest key’s owner

Page 21: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Data Fail

– Timeout (hops-to-live)• Receiving a Data Failed Message

– Forwards the request to the next best node– After failed through all neighbors,

Sends back data filed message to the request sender

Page 22: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Data Reply

– Includes the actual data – Passed back through the chain– The data is cached in all intermediate nodes

• A subsequent request w/ the same key → served immediately

• A request for a similar key → forwarded to the node that previously provided the data

Page 23: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Indirect Files

– A special class of lightweight files– Named according to search keywords– Contain pointers to the real file– Multiple files w/ the same key

Page 24: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Indirect Files

Page 25: A Survey of Peer-to-Peer Content Distribution Technologies

Freenet (Cont’d)• Properties

– Nodes specialize in searching for similar keys– Nodes store similar keys– Similarity of keys does not reflect similarity of files– Routing does not reflect the underlying network

topology

Page 26: A Survey of Peer-to-Peer Content Distribution Technologies

Chord• Nodes and Files are identified by keys

– m-bit identifiers– a deterministic hash function

• Mapping File ID onto Node ID– Nodes store (key, data item) pairs

Page 27: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• A Chord Identifier Circle

Page 28: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• Simple Key Location

Page 29: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• Scalable Key Location

Page 30: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• Simple Key Location

– Routing Information: Successor pointer– O( n )

• Scalable Key Location– Routing Information: Finger Table– O( logn )

Page 31: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• Node Joining

– Certain keys assigned to its successor are reas-signed to it

• Node Departing

– Keys are reassigned to its successor

Page 32: A Survey of Peer-to-Peer Content Distribution Technologies

Chord (Cont’d)• Node Joining

– N26 joins the network

Page 33: A Survey of Peer-to-Peer Content Distribution Technologies

CAN Content Addressable Network

• Hash Table– Maps file names to their location– ( key K, value V ) pairs stored– Each node storing a part of the hash table

• A “zone”

Page 34: A Survey of Peer-to-Peer Content Distribution Technologies

CAN (Cont’d)• Virtual coordinate space

– A zone corresponds to a segment of space– Key K is mapped onto a point P

• A deterministic function– ( K, V ) is stored at the node responsible for P

Page 35: A Survey of Peer-to-Peer Content Distribution Technologies

CAN (Cont’d)• Virtual coordinate space

Page 36: A Survey of Peer-to-Peer Content Distribution Technologies

CAN (Cont’d)• Retrieve

– Map K to P– Retrieve the value from the node covering P

• Routing– Request is routed to the node covering P– Nodes maintain a routing table

• Addresses of Nodes holding adjoining zones– Following the straight line path in the space

Page 37: A Survey of Peer-to-Peer Content Distribution Technologies

CAN (Cont’d)• Routing

Page 38: A Survey of Peer-to-Peer Content Distribution Technologies

CAN (Cont’d)• Node Joining

– Allocatedits own portion of the space• By splitting the zone of an existing node

• Node Departing– Hand over hash table entries to one of its neigh-

bors

Page 39: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry• Location and Routing Infrastructure

– Self Administeration– Fault Tolerance– Stability

• By bypassing failed routes and nodes

• Plaxton Mesh– Routing mechanism– Location mechanism

Page 40: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Routing Mechanism

– Neighbor Maps• Local routing maps• Incrementally route messages • Multiple levels

– Level l → node ID matched w/ l digits• Multiple entries

– The number equals to the base of the ID• Pointer to the closest node in the network

Page 41: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Neighbor Map of Node w/ ID 67493

Page 42: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Routing Path from 67493 to 34567

– xxxx7 → xxx67 → xx567 → x4567→ 34567

Page 43: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Location Mechanism

– Root node• Provide a guaranteed node from which the object can

be located• Assigned when an object is inserted

– A globally consistent deterministic algorithm

– When inserted• Server node Ns, object O, root node Nr• Message routed to Ns to Nr• (O, Ns) stored along the routing path

Page 44: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Location Mechanism

– Location query• Messages destined for O• Initially routed toward to Nr• Meet a node containing (O, Ns) mapping

Page 45: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Advantages of Plexton Mesh

– Simple fault-handling• Routing by choosing a node w/ a similar suffix

– Scalability• w/ the only bottleneck (root nodes)

• Limitations– The need for global knowledge

• Assigning and identifying root nodes– The vulnerability of the root nodes

Page 46: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Extending Plaxton mesh’s Design

– Plaxton mesh assumes a static node population– Tapestry adapts it to the transient population

• Adaptibility• Fault tolerance• Optimizations

Page 47: A Survey of Peer-to-Peer Content Distribution Technologies

Tapestry (Cont’d)• Optimizations

– Back-pointers for dynamic node insertion– Flexible concept of distance between nodes– Maintain cached content for failures– Multiple roots to each object– Adapt to environment changes

Page 48: A Survey of Peer-to-Peer Content Distribution Technologies

Other Aspects• Content Caching, Replication and Migration• Security• Provisions for Anonymity• Provisions for Deniability• Incentive Mechanisms and Accountability• Resource Management Capability• Semantic Grouping of Information

Page 49: A Survey of Peer-to-Peer Content Distribution Technologies

Conclusions• Study of P2P Content Distribution Systems

– Properties– Design features

• Location and routing algorithms– Two Categories

• Unstructured system• Structured system

– Remains Open Research Problems