Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently...

49

Transcript of Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently...

Page 1: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.
Page 2: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

IntroductionIntroduction

Widespread unstructured P2P networkWidespread unstructured P2P network Currently between 200,000 & 300,000 Currently between 200,000 & 300,000

hostshosts

Ideal as a research test bedIdeal as a research test bed Large scale network demonstrates the Large scale network demonstrates the

need for scalable P2P protocolsneed for scalable P2P protocols

A Gnutella client has 4-10 TCP connections A Gnutella client has 4-10 TCP connections to other peersto other peersFor signaling traffic UDP isFor signaling traffic UDP is used and to used and to make use of the benefits of server based make use of the benefits of server based networks a ”ultra-peer” state wasnetworks a ”ultra-peer” state was createdcreated

Page 3: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Introduction (Cont.)Introduction (Cont.)””Ultra-peer” status is self assigned by powerful peers Ultra-peer” status is self assigned by powerful peers and provides some extraand provides some extra functionality compared to functionality compared to ordinary nodesordinary nodesThere exist many freely available GnutellaThere exist many freely available Gnutella clientsclientsSome of the most popular areSome of the most popular are::

LimewireLimewire BearshareBearshare MorpheusMorpheus ShareazaShareaza

ItIt has the most increasing number of users has the most increasing number of usersIt has a veryIt has a very pleasant GUI and connects also to eDonkey and pleasant GUI and connects also to eDonkey and BitTorrentBitTorrent

Page 4: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Its Main FeaturesIts Main Features

This protocol underlies much of the This protocol underlies much of the current file-sharing activity on the current file-sharing activity on the Internet.Internet.It is based on TCP/IP and http!It is based on TCP/IP and http!A file sharing network (fsn) is a bunch of A file sharing network (fsn) is a bunch of machines that exchange files using machines that exchange files using gnutella.gnutella.To connect to a gnutella network, you To connect to a gnutella network, you need the IP address of one single machine need the IP address of one single machine that is already part of the network.that is already part of the network.

Page 5: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

GnutellaGnutella

Peer-to-peer indexing and searching Peer-to-peer indexing and searching service.service.

Peer-to-peer point-to-point file Peer-to-peer point-to-point file downloading using HTTP.downloading using HTTP.

A gnutella node needs a server (or a set of A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com servers) to “start-up”… gnutellahosts.com provides a service with reliable initial provides a service with reliable initial connection pointsconnection points

But introduces a new single point of failure!But introduces a new single point of failure!

Page 6: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Gnutella vs. NapsterGnutella vs. Napster

Like Napster, distributed file storage and Like Napster, distributed file storage and transmissiontransmission

Added the ability to distribute file discoveryAdded the ability to distribute file discovery Ask your direct peers who else they knowAsk your direct peers who else they know Query those machines directlyQuery those machines directly

Page 7: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Concepts of Unstructured Concepts of Unstructured ServicesServices

There are many interesting ideas being explored;There are many interesting ideas being explored; Breaking shared files into many parts to both increase Breaking shared files into many parts to both increase

bandwidth (parallel I/O) and increase security of bandwidth (parallel I/O) and increase security of content as no one site can access files without content as no one site can access files without cooperation from its peerscooperation from its peers

This type of technology makes censorship very hard. This type of technology makes censorship very hard. MojoNation has a load balancing and scheduling MojoNation has a load balancing and scheduling

algorithm in the form of micro payments to reward algorithm in the form of micro payments to reward those who contribute most to the community of peers. those who contribute most to the community of peers.

Gnutella - which is a family of related products -- is Gnutella - which is a family of related products -- is usually described as a P2P search engine as its usually described as a P2P search engine as its interface is nearer that of a search engine than a Web interface is nearer that of a search engine than a Web file systemfile system

Page 8: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

CharacteristicsCharacteristics

Gnutella is a distributed system for file Gnutella is a distributed system for file sharingsharing

provide means provide means for network discoveryfor network discovery

provide means provide means for file searching and sharingfor file searching and sharing

Defines a network at the application levelDefines a network at the application level Employs the concept of peer-to-peerEmploys the concept of peer-to-peer

all hosts are equal (symmetry)all hosts are equal (symmetry)

there is no central pointthere is no central point

anonymous search, but reveal the IP anonymous search, but reveal the IP addresses when downloadingaddresses when downloading

Page 9: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

connectionconnection

Once you establish connection to the first Once you establish connection to the first servent, you announce your presence. servent, you announce your presence. The first servent will pass on that message The first servent will pass on that message to all the servents that it is connected to, to all the servents that it is connected to, and so on. and so on. These servents all reply with data about These servents all reply with data about themselvesthemselves how many files it is sharinghow many files it is sharing how many kilo bytes the files take uphow many kilo bytes the files take up

This already adds up to a lot of traffic!This already adds up to a lot of traffic!

Page 10: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Gnutella File Sharing model

Users register files with network neighbors Search across the network to find files to copy Does not require a centralized broker (as Napster)

Bob Carol

Ted Alice

Where is Final Fantasy 4? Carol has Final Fantasy 4

Copying Final Fantasy 4

Where is Final Fantasy 4?

Carol has it

Page 11: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Decentralized File-sharing Decentralized File-sharing ModelModel

Peers have same capability and responsibilityPeers have same capability and responsibility

The communication between peers is symmetricThe communication between peers is symmetric

There is no central directory server Index on the There is no central directory server Index on the metadata of shared files is stored locally among metadata of shared files is stored locally among all peersall peers GnutellaGnutella FreeServeFreeServe MojoNationMojoNation

Resource DiscoveryResource Discovery

Page 12: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

DecentralizedDecentralized (Cont.) (Cont.)

every user acts as a client, a server or every user acts as a client, a server or both (both (serventservent))

User connects to framework and becomes User connects to framework and becomes a member of the community, allowing a member of the community, allowing others to connect through him/herothers to connect through him/her

Users speak directly to other users with Users speak directly to other users with no intermediate or central authorityno intermediate or central authority

No one entity controls the information No one entity controls the information that passes through the communitythat passes through the community

Resource DiscoveryResource Discovery

Page 13: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Advantages and Advantages and DisadvantagesDisadvantages

Advantages:Advantages: Inherent scalabilityInherent scalability Avoidance of “single point of litigation” Avoidance of “single point of litigation”

problemproblem Fault ToleranceFault Tolerance

Disadvantages:Disadvantages: Slow information discoverySlow information discovery More query traffic on the networkMore query traffic on the network

Resource DiscoveryResource Discovery

Page 14: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Unstructured Decentralized Unstructured Decentralized ServicesServices

There some 200 available Napster clones to support this area There some 200 available Napster clones to support this area http://www.ultimateresourcesite.com/napster/main.htmhttp://www.ultimateresourcesite.com/napster/main.htmCurrently the most popular is Imesh [Currently the most popular is Imesh [http://www.imesh.comhttp://www.imesh.com], ], which has some 2 million users and can share any type of file.which has some 2 million users and can share any type of file. Some of the best known file sharing systems are Some of the best known file sharing systems are MojoNation [MojoNation [http://www.mojonation.nethttp://www.mojonation.net]] Freenet Freenet [http://freenet.sourceforge.net/[http://freenet.sourceforge.net/] ] Gnutella [Gnutella [http://gnutella.wego.com/http://gnutella.wego.com/]]

These three are not server based like Napster but rather support These three are not server based like Napster but rather support waves of software agents expressing resource availability and waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of interest propagating among an informal dynamic networks of peerspeers

Page 15: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

DFS VariationsDFS Variations

FTPFTP NFSNFS WebWeb NapsterNapsterShawn Fanning Shawn Fanning

GnutellaGnutellaGene Kan @ AOLGene Kan @ AOL

FreenetFreenetIan ClarkIan Clark

PurposePurpose RemoteRemote file file sharingsharing

Local Local file file sharingsharing

Remote file Remote file sharing sharing (portal)(portal)

File-sharing File-sharing community community

(portal)(portal)

Decentralized Decentralized file sharing file sharing communitycommunity

Decentralized Decentralized anonymousanonymous file sharingfile sharing

Moderated?Moderated? YesYes YesYes YesYes YesYes NoNo NoNo

Access Access control?control? YesYes YesYes NoNo NoNo NoNo NoNo

SearchSearch Server-Server-basedbased

Server-Server-basedbased

Server-Server-basedbased

Server-Server-basedbased p2pp2p p2pp2p

File transferFile transfer Client/Client/ serverserver

Client/Client/ serverserver

Client/Client/ serverserver p2pp2p p2pp2p p2pp2p

File transfer File transfer protocolprotocol ftpftp nfsnfs http, http,

cachingcachingproprietarproprietar

yyhttphttp

Proprietary,Proprietary, encrypted, encrypted,

cachingcaching

DFSDFS: Distributed File S: Distributed File Sharingharing

Page 16: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

P2P File Sharing P2P File Sharing BenefitsBenefits

Cost sharingCost sharing

Resource aggregationResource aggregation

Improved scalability/reliabilityImproved scalability/reliability

Anonymity/privacyAnonymity/privacy

DynamismDynamism

Page 17: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Management/Placement Management/Placement ChallengesChallenges

Per-node statePer-node state

Bandwidth usageBandwidth usage

Search timeSearch time

Fault tolerance/resiliencyFault tolerance/resiliency

Page 18: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

GnutellaGnutella in Details in Details

Share any type of files Share any type of files (not just music)(not just music)Decentralized search Decentralized search unlike Napsterunlike Napster

You ask your You ask your neighbors for files of neighbors for files of interestinterestNeighbors ask their Neighbors ask their neighbors, and so onneighbors, and so on

TTL field quenches TTL field quenches messages after a messages after a number of hopsnumber of hops

Users with matching Users with matching files reply to youfiles reply to you

Figure from http://computer.howstuffworks.com/file-sharing.htm

Page 19: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

The Gnutella protocol (v0.4)The Gnutella protocol (v0.4)

PING – Notify a peer of your existencePING – Notify a peer of your existence

PONG – Reply to a PING request PONG – Reply to a PING request

QUERY – Find a file in the networkQUERY – Find a file in the network

RESPONSE – Give the location of a fileRESPONSE – Give the location of a file

PUSHREQUEST – Request a server behind PUSHREQUEST – Request a server behind a firewall to push a file out to a client.a firewall to push a file out to a client.

Page 20: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Joining Joining Gnutella Gnutella NetworkNetwork

A

Gnutella NetworkThe new node connects to a The new node connects to a well known ‘Anchor’ node.well known ‘Anchor’ node.

Then sends a PING message Then sends a PING message to discover other nodes.to discover other nodes.

PONG messages are sent in PONG messages are sent in reply from hosts offering reply from hosts offering new connections with the new connections with the new node.new node.

Direct connections are then Direct connections are then made to the newly made to the newly discovered nodes.discovered nodes.

NewPING

PINGPING

PINGPING

PINGPING

PINGPING

PING

PONG

PONG

Page 21: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Properties of the FloodingProperties of the FloodingSearching by flooding:Searching by flooding:

If you don’t have the file you want, query 7 of If you don’t have the file you want, query 7 of your partners.your partners.

If they don’t have it, they contact 7 of their If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10.partners, for a maximum hop count of 10.

Requests are flooded, but there is no tree Requests are flooded, but there is no tree structure.structure.

No looping but packets may be received twiceNo looping but packets may be received twice

Note: Play gnutella animation at:

http://www.limewire.com/index.jsp/p2p

Page 22: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Query floodingQuery flooding

Gnutella Gnutella

no hierarchyno hierarchy

use bootstrap node to use bootstrap node to learn about otherslearn about others

join messagejoin message

Send query to neighborsSend query to neighbors

Neighbors forward query to Neighbors forward query to all attached neighbors all attached neighbors ((floodsfloods))

If queried peer has object, it If queried peer has object, it sends message back to sends message back to querying peerquerying peer

join

query

Page 23: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

MMore on query floodingore on query flooding

ProsPros

peers have similar peers have similar responsibilities: no responsibilities: no group leadersgroup leaders

highly decentralizedhighly decentralized

no peer maintains no peer maintains directory infodirectory info

ConsCons

excessive query trafficexcessive query traffic

query radius: may not query radius: may not have content when have content when presentpresent

bootstrap node still bootstrap node still requiredrequired

maintenance of maintenance of overlay networkoverlay network

Page 24: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

About the FloodingAbout the Flooding

There is nothing that stops a servant flooding its network region with messages.There is nothing that stops a servant flooding its network region with messages.

Cost of Cost of maintaining Networkmaintaining NetworkCost of Cost of searching filesearching file

Page 25: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Breadth-First Search Breadth-First Search (BFS)(BFS)

= forward query

= processed query

= source

= found result

= forward response

Page 26: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Pros and ConsPros and ConsBenefits:Benefits:

Peers speak directly with no central authorityPeers speak directly with no central authorityNobody owns the Gnutella Network and nobody can shut it downNobody owns the Gnutella Network and nobody can shut it downNo central point of failureNo central point of failure

Limited per-node state Isolated node failure can quickly and Limited per-node state Isolated node failure can quickly and automatically be worked aroundautomatically be worked around

Free loading Free loading ScalabilityScalability

Drawbacks:Drawbacks: Searches are less effective and can be slowSearches are less effective and can be slow Bandwidth intensiveBandwidth intensive

Gnutella network evolving to include “controlled Gnutella network evolving to include “controlled

decentralization” (limewire, bearshare, toadnode)decentralization” (limewire, bearshare, toadnode)

Resource DiscoveryResource Discovery

Page 27: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Searching for a FileSearching for a File

Gnutella Network

QUERYQUERY

QUERYQUERY

QUERY

QUERY

QUERY

QUERYQUERY

A node broadcasts its A node broadcasts its QUERY to all its peers who QUERY to all its peers who in turn broadcast to their in turn broadcast to their peers.peers.

Nodes route QUERYHITs Nodes route QUERYHITs along the QUERY path back along the QUERY path back to the sender containing file to the sender containing file location details.location details.

To download files a direct To download files a direct connection is made using connection is made using details of the host in the details of the host in the QUERYHIT messages.QUERYHIT messages.

HIT

HIT

Page 28: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

The Cooperation SpectrumThe Cooperation Spectrum

Page 29: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Free RidingFree Riding

File sharing networks rely on users sharing dataTwo types of free riding

Downloading but not sharing any data Not sharing any interesting data

On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data

Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”

Page 30: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Example: GNUTELLAExample: GNUTELLA

Page 31: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Summary of the Gnutella’s Summary of the Gnutella’s FeaturesFeatures

DecentralizedDecentralized No single point of failureNo single point of failure Not as susceptible to denial of serviceNot as susceptible to denial of service Cannot ensure correct resultsCannot ensure correct results

Flooding queriesFlooding queries Search is now distributed but still not Search is now distributed but still not

scalablescalable

Page 32: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Initials Problems and FixesInitials Problems and Fixes

Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing

Block file-serving to browser-based non-file-sharing users

Prematurely terminated downloads: Software bugs long download times over modems modem users run gnutella peer only briefly

(Napster problem also!) or any users becomes overloaded

fix: peer can reply “I have it, but I am busy. Try again later”

Page 33: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Initials Problems and Fixes 2Initials Problems and Fixes 22000: avg size of reachable network only 400-800 hosts

Why so small?modem users: not enough bandwidth to provide search routing capabilities: routing black holesFix: create peer hierarchy based on capabilities

previously: all peers identical, most modem blackholes

connection preferencing:favors routing to well-connected peersfavors reply to clients that themselves serve large number of files: prevent freeloading

Limewire gateway functions as Napster-like central server on behalf of other peers

for searching purposes

Page 34: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Gnutella EnhancementsGnutella Enhancements

Pings/Pongs can consume up to 50% of bandwidthSolutions:

Pong Limiting Pong Caching Ping Multiplexing

http://www.limewire.com/index.jsp/pingpong

Page 35: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Gnutella enhancements 2Gnutella enhancements 2

Cache query responsesResultsEvolving Protocol

Gnutella Developer Forum

UltraPeersAlternative query routing algorithms

Page 36: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Can Heterogeneity Make GnutellaScale?

Ideas Replace query flooding with multiple

random walks Proactive replication

#replicas proportional to sqrt(request rate)

Result: Two orders of magnitude improvement in terms of query-time, per node load and message traffic

Page 37: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Can Heterogeneity Make GnutellaScale? 2

Gnutella assumption: All peers are equal Not true! Heterogeneity among P2P peers

(dial-up users vs. college users) Evolve topology to match node capacities Use random walks over this topology

Page 38: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Can Heterogeneity Make GnutellaScale? 3

Solution outline C_i, node capacity in[j,i] messages from j->i, out[i,j]

messages i->j Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I Update according the messages received/sent Check if overloaded

If so redirect high-input neighbor to neighbor with high OutMax (spare capacity)Intuitively, take yourself out of the loopIf node cannot be found ask neighbor to throttle back

Result: Average query length reduces from 70 to 2-9 hops

depending on topology

Page 39: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Measurement ResultsMeasurement Results

Who is sharing Who is sharing what?what?

August 2000August 2000

The top Share As percent of whole

333 hosts (1%) 333 hosts (1%) 1,142,645 1,142,645 37%37%

1,667 hosts (5%)1,667 hosts (5%) 2,182,0872,182,087 70%70%

3,334 hosts (10%) 3,334 hosts (10%) 2,692,0822,692,082 87% 87%

5,000 hosts (15%)5,000 hosts (15%) 2,928,9052,928,905 94%94%

6,667 hosts (20%)6,667 hosts (20%) 3,037,2323,037,232 98%98%

8,333 hosts (25%)8,333 hosts (25%) 3,082,5723,082,572 99%99%

Page 40: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Protocol scalabilityProtocol scalability Message broadcast technique imposes limitations Message broadcast technique imposes limitations

on the network sizeon the network size

packets per message = packets per message = ∑∑noPeersnoPeersii

IInn November 2000 dial-up bandwidth barrier November 2000 dial-up bandwidth barrier reachedreached

Overlay network efficiencyOverlay network efficiency Random selection of peers results in inefficient use Random selection of peers results in inefficient use

of the underlying networkof the underlying network Redundant traffic generated on the InternetRedundant traffic generated on the Internet

Problems With GnutellaProblems With Gnutella

TTL

i=0

Page 41: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Heterogeneous connection Heterogeneous connection qualitiesqualities of the Gnutella of the Gnutella

35% have upstream bottleneck bandwidth 35% have upstream bottleneck bandwidth of at least 100Kbpsof at least 100Kbps

only 8% have at least 10Mbps bandwidthonly 8% have at least 10Mbps bandwidth

22% have bandwidth 100kbps or less22% have bandwidth 100kbps or less

Page 42: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Number of Shared FilesNumber of Shared Files

Page 43: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Why Look at GnutellaWhy Look at GnutellaWidespread unstructured P2P networkWidespread unstructured P2P network

Currently between 200,000 & 300,000 hostsCurrently between 200,000 & 300,000 hosts 2006: 2006: still heavily in use by about 2 million users Gnutella clients (among others):Gnutella clients (among others):

LimeWireLimeWireMorpheusMorpheusBearShareBearShareOpenColaOpenColaShareazaShareaza

It has the most increasing number of usersIt has the most increasing number of users It has a very pleasant GUI and connects also to It has a very pleasant GUI and connects also to

eDonkey and BitTorrenteDonkey and BitTorrentIdeal as a research test bedIdeal as a research test bed

Large scale network demonstrates the need for scalable Large scale network demonstrates the need for scalable P2P protocolsP2P protocols

Page 44: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Limewire: Improvement on Limewire: Improvement on GnutellaGnutella

CCreatreationion peer hierarchy based on capabilities peer hierarchy based on capabilities previously: all peers identical, most modem previously: all peers identical, most modem

blackholesblackholes connection preferencing:connection preferencing:

favors routing to well-connected peersfavors routing to well-connected peers

favors reply to clients that themselves serve large favors reply to clients that themselves serve large number of files: prevent freeloadingnumber of files: prevent freeloading

Limewire gateway functions as Napster-like Limewire gateway functions as Napster-like central server on behalf of other peerscentral server on behalf of other peers for searching purposesfor searching purposes

Page 45: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

LimewireLimewire

The Limewire P2P file sharing program connects to The Limewire P2P file sharing program connects to the Gnutella P2P networkthe Gnutella P2P network

Limewire client software is widely recognized for its Limewire client software is widely recognized for its clean user interface that does not contain adwareclean user interface that does not contain adware

Sometimes billed as the „fastest file sharing Sometimes billed as the „fastest file sharing program”program”

Limewire claims to offer relatively good search and Limewire claims to offer relatively good search and download performancedownload performance

Free Limewire software downloads are available for Free Limewire software downloads are available for Windows, Linux and Macintosh operating systemsWindows, Linux and Macintosh operating systems

Limewire Pro pay clients also existLimewire Pro pay clients also exist

Page 46: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

BearShareBearShare

The BearShare P2P file sharing program is The BearShare P2P file sharing program is a popular free software client for the a popular free software client for the Gnutella P2P networkGnutella P2P network

Both free and pay downloads of BearShare Both free and pay downloads of BearShare file sharing programs existfile sharing programs exist

Page 47: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

ShareazaShareaza

Shareaza is an up-and-coming P2P file sharing Shareaza is an up-and-coming P2P file sharing programprogramThis client offers an extremely powerful search This client offers an extremely powerful search engine capable of connecting to multiple popular engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and P2P networks including eDonkey, BitTorrent and GnutellaGnutellaShareaza file sharing software includes intelligence Shareaza file sharing software includes intelligence for detecting fake and/or corrupted filesfor detecting fake and/or corrupted filesThe free Shareaza download also contains no ads The free Shareaza download also contains no ads or spywareor spywareAs the installed base of Shareaza client users growsAs the installed base of Shareaza client users grows expect Shareaza to become an even better P2P expect Shareaza to become an even better P2P

file sharing programfile sharing program

Page 48: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Anonymous?Anonymous?

The person you are getting the file from knows The person you are getting the file from knows who you arewho you are That’s not anonymous.That’s not anonymous.

Other protocols exist where the owner of the files Other protocols exist where the owner of the files doesn’t know the requester.doesn’t know the requester.

Peer-to-peer anonymity existsPeer-to-peer anonymity exists

Page 49: Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

SummarySummarypeer-to-peer networking: applications connect to peer applications peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for filesfocus: decentralized method of searching for fileseach application instance serves to:each application instance serves to:

store selected filesstore selected files route queries (file searches) from and to its neighboring peersroute queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locallyrespond to queries (serve file) if file stored locally

Gnutella history:Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AMtoo late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many iterations to fix poor initial design (poor design turned

many people off)many people off)What we care about:What we care about:

How much traffic does one query generate?How much traffic does one query generate? how many hosts can it support at once?how many hosts can it support at once? What is the latency associated with querying?What is the latency associated with querying? Is there a bottleneck?Is there a bottleneck?