Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the...

6
1 Lecture 21 Peer-to-Peer Model March 16, 2005 P2P Overview: centralized database: Napster query flooding: Gnutella intelligent query flooding: KaZaA swarming: BitTorrent unstructured overlay routing: Freenet structured overlay routing: Distributed Hash Tables Napster Centralized Database: Join: on startup, client contacts central server Publish: reports list of files to central server Search: query the server => return someone that stores the requested file Fetch: get the file directly from peer Centralized Index Problems with Napster: centralized index is highly loaded if index fails, the whole system stops Servent 1 Servent 2 Centralized Index I have file foo.mp3 I am looking for file foo.mp3 Get it from servent 1 Please give me foo.mp3 Here it is 1 2 3 4 5 Napster Pros: simple search scope is O(1) controllable (pro or con?) Cons: server maintains O(N) State server does all processing single point of failure Gnutella In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella Soon many other clients: Bearshare, Morpheus, LimeWire, etc. In 2001, many protocol enhancements including “ultrapeers”

Transcript of Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the...

Page 1: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

1

Lecture 21

Peer-to-Peer ModelMarch 16, 2005

P2P

Overview: centralized database: Napster query flooding: Gnutella intelligent query flooding: KaZaA swarming: BitTorrent unstructured overlay routing: Freenet structured overlay routing: Distributed Hash

Tables

Napster Centralized Database:

Join: on startup, client contacts central server Publish: reports list of files to central server Search: query the server => return someone that

stores the requested file Fetch: get the file directly from peer

Centralized Index

Problems with Napster: centralized index is highly loaded if index fails, the whole system stops

Servent 1 Servent 2

CentralizedIndex

I have filefoo.mp3

I am looking forfile foo.mp3 Get it from

servent 1

Please give me foo.mp3

Here it is

12

3

4

5

Napster Pros:

simple search scope is O(1) controllable (pro or con?)

Cons: server maintains O(N) State server does all processing single point of failure

Gnutella

In 2000, J. Frankel and T. Pepper fromNullsoft released Gnutella

Soon many other clients: Bearshare,Morpheus, LimeWire, etc.

In 2001, many protocol enhancementsincluding “ultrapeers”

Page 2: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

2

Gnutella Query Flooding:

Join: on startup, client contacts a fewother nodes; these become its “neighbors”

Publish: no need Search: ask neighbors, who ask their

neighbors, and so on... when/if found,reply to sender.

Fetch: get the file directly from peer

Gnutella (v0.4)

PING – Notify a peer of your existence PONG – Reply to a PING request QUERY – Find a file in the network RESPONSE – Give the location of a file PUSHREQUEST – Request a server behind

a firewall to push a file out to a client.

Flooding Searches We keep the servents, but we remove the centralized index. Each servent is connected to a few others. File search requests are sent recursively through the network, until

the file is found or a distance limit is reached (e.g., 3 hops).

S1

S4S5

S3

S6

S8

S9S7

1

22

3

23

3*

Flooding Searches

Advantages: real P2P system (every node has same role) no centralized server self-organized (connect your servent to a few

others) Drawbacks:

each search generates lots of traffic certain nodes become highly linked/loaded distance limits: do not search entire system

Flooding Searches

Possible optimization: superpeers (Kazaa) subset of servers with high capacity is

dynamically selected to act as local indexes normal clients only talk to superpeers superpeers talk to each other to resolve queries

Gnutella Descriptor Header

Descriptor ID uniquely identifies descriptor on the network Payload Descriptor is 0x00 = Ping, 0x01 = Pong, 0x40 =

Push, 0x80 = Query, 0x81 = Response TTL(0) = TTL(i)+Hops(i)

Page 3: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

3

Gnutella Ping

No payload (i.e., payload length = 0). Used for probing the network.

Gnutella Pong DescriptorPayload

Responding to “Ping” descriptors. Enough information to establish connection. File sharing meta-data.

Gnutella Query DescriptorPayload

For querying the network for a particular file or files (usuallysubstring of file name).

Quality of Service parameter (minimum speed).

Gnutella Response DescriptorPayload

For positive “File Found” replies to a query.

Result Set field includes file index, size and name.

Gnutella Push DescriptorPayload

For getting files from firewall-protected servents. Request pushing a file from an internal node to an outside

servent.

Gnutella: Routing Unique IDs, servents memorize IDs to prevent looping. Pong descriptors are sent the same route as ping descriptors (if a

servent sees a pong but did not see a ping, pong is discarded). Same with QueryHit and Query. Same with Push and QueryHit. Ping & query are forwarded to all neighbors except the one that the

message came from. Each servent decrements TTL and increments Hops. If TTL is zero,

the descriptor is not forwarded along any connection.

Page 4: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

4

Gnutella: Downloads

After query hits are received, servent canselect file.

Download request is via HTTP GETmessage (servent replies with HTTP OKfollowed by file data).

Data is sent over direct TCP connection (notthe Gnutella network), the protocol is HTTP.

Gnutella

Pros: Fully de-centralized Search cost distributed

Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable

Aside: Search Time Aside: All Peers Equal?

56kbps Modem

10Mbps LAN

1.5Mbps DSL

56kbps Modem56kbps Modem

1.5Mbps DSL

1.5Mbps DSL

1.5Mbps DSL

Aside: Network Resilience

Partial Topology Random 30% die Targeted 4% die

KaZaA In 2001, KaZaA created by Dutch company

KaZaA BV. Single network called FastTrack used by

other clients as well: Morpheus, giFT, etc. Eventually protocol changed so other clients

could no longer talk to it. Most popular file sharing network today with

>10 million users (number varies).

Page 5: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

5

KaZaA “Smart” Query Flooding:

Join: on startup, client contacts a “supernode” ... may atsome point become one itself

Publish: send list of files to supernode Search: send query to supernode, supernodes flood query

amongst themselves. Fetch: get the file directly from peer(s); can fetch

simultaneously from multiple peers

KaZaA“Super Nodes”

KaZaA: File Insert

I have X!

Publish

insert(X,

123.2.21.23)...

123.2.21.23

KaZaA: File Search

Query

search(A)-->123.2.0.18

search(A)-->123.2.22.50

Replies

123.2.0.18

123.2.22.50

Where is file A?

KaZaA: Fetching More than one node may have requested file... How to tell?

must be able to distinguish identical files Not necessarily same filename same filename not necessarily same file...

Use Hash of file KaZaA uses UUHash: fast, but not secure alternatives: MD5, SHA-1

How to fetch? Get bytes [0..1000] from A, [1001...2000] from B alternative: Erasure Codes

KaZaA Pros:

tries to take into account node heterogeneity: bandwidth host Computational Resources host Availability (?)

rumored to take into account network locality Cons:

mechanisms easy to circumvent still no real guarantees on search scope or search time

Page 6: Lecture 21 - University of Notre Damecpoellab/teaching/cse354/sp21.pdfPong descriptors are sent the same route as ping descriptors (if a servent sees a pong but did not see a ping,

6

BitTorrent In 2002, B. Cohen debuted BitTorrent Key motivation:

popularity exhibits temporal locality (flash crowds) e.g., Slashdot effect, CNN on 9/11, new movie/game release

Focused on efficient Fetching, not Searching: distribute the same file to all peers single publisher, multiple downloaders

Has some “real” publishers: Blizzard Entertainment using it to distribute the beta of their new

games

BitTorrent

Swarming: Join: contact centralized “tracker” server, get a

list of peers. Publish: run a tracker server. Search: out-of-band, e.g., use Google to find a

tracker for the file you want. Fetch: download chunks of the file from your

peers. Upload chunks you have to them.

BitTorrent: Publish/JoinTracker

BitTorrent: Fetch

BitTorrent: Sharing Strategy Employ “Tit-for-tat” sharing strategy

“I’ll share with you if you share with me” be optimistic: occasionally let freeloaders download

otherwise no one would ever start! also allows you to discover better peers to download from

when they reciprocate similar to: Prisoner’s Dilemma

Approximates Pareto Efficiency Game Theory: “No change can make anyone better off

without making others worse off”

BitTorrent

Pros: works reasonably well in practice gives peers incentive to share resources; avoids

freeloaders Cons:

Pareto Efficiency relative weak condition central tracker server needed to bootstrap swarm

(is this really necessary?)