Transcript of Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement...
- Slide 1
- Characteristics of Current P2P File-Sharing Systems (with a
brief excursion into network measurement tools) Stefan Saroiu P.
Krishna Gummadi Steven Gribble University of Washington
- Slide 2
- Peer-to-Peer Frenzy Both research and industrial excitement
CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus,
AudioGalaxy Basic Premise wide-area, distributed system voluntary,
ad-hoc, dynamic home-user peers exchange information (mostly large
files) Many proposals, yet nobody knows the participating peers
characteristics and behavior
- Slide 3
- SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R
P S peer server Q R D response query file download NapsterGnutella
R Napster & Gnutella
- Slide 4
- Methodology 2 stages: 1.periodically crawl Gnutella/Napster
discover peers and their metadata 2.feed output from crawl into
measurement tools: bottleneck bandwidth SProbe latency SProbe peer
availability LF degree of content sharing Napster crawler
- Slide 5
- Network Bandwidth Scenarios Network measurements Dynamic
server/peer selection P2P overlay formation or application-level
multicast Placement of content replicas
- Slide 6
- Network Bandwidth 1.Throughput: number of transferred bytes
during a fix interval of time 2.Available bandwidth: the maximum
attainable throughput of a newly started flow 3.Bottleneck
bandwidth: maximum throughput ideally obtained across the slowest
link Hard to measure: throughput, available bandwidth Easier to
measure: bottleneck bandwidth
- Slide 7
- One-Packet Model slope = bandwidthbottleneck 1 probing packet
Traversal Time Packet Size
- Slide 8
- Packet-Pair Model bottleneck bandwidth time dispersion
proportional to bottleneck bandwidth t sizepacket
bandwidthbottleneck
- Slide 9
- Vital Properties of an Ideal Tool Accurate Fast: 1
min/measurement too slow Scalable: flooding the network will not
work Works in Uncooperative Environments cant deploy software at
both endpoints
- Slide 10
- Properties of an Ideal Tool Active: existent traffic might not
be suitable TCP/UDP based: ICMP heavily filtered Cross-traffic
resilient: should detect and give up in the face of cross traffic
Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled
Evaluations
- Slide 11
- Current Tools Desired Properties Path- char
pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative
Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric
Bandwidth changes Controlled Evaluations
- Slide 12
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 13
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 14
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 15
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 16
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 17
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 18
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 19
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 20
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 21
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 22
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 23
- SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
- Slide 24
- SProbe Uses TCP Tricks From remote To local Involuntary
cooperation of application layer LocalRemote (Web) HTTP Get request
Data packet ACK (last data packet)
- Slide 25
- SProbes Accuracy
- Slide 26
- Slide 27
- More SProbe Bottleneck Bandwidth Latency Availability (LF):
send a SYN packet receive: SYN/ACK host active RST host inactive,
but online nothing host offline
- Slide 28
- P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
- Slide 29
- P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
- Slide 30
- Higher Downstream Bandwidths
- Slide 31
- Most Peers have Cable Modem-like Bandwidths
- Slide 32
- Yes, Lots of Cable Modems
- Slide 33
- Closest 20% are 4X closer than furthest 20%
- Slide 34
- Two horizontal bands East Coast and Transoceanic Links
- Slide 35
- Availability Period probes yield data like: start end
- Slide 36
- Availability Period probes yield data like: Divide into two
periods Keep segments that: start in 1 st period end in 1 st or 2
nd periods draw conclusion only on segments no larger than 2 nd
period start end 12 hours
- Slide 37
- Median Session is about one hour (same for both systems)
- Slide 38
- Gnutella/Napster Uptime
- Slide 39
- P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
- Slide 40
- Who Has the Files?
- Slide 41
- Slide 42
- Correlation of Free-Riding with B/W
- Slide 43
- P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
- Slide 44
- Its all about incentive!
- Slide 45
- Lack of Knowledge is Universal
- Slide 46
- P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
- Slide 47
- Power-Law Networks are here to Stay Barabasi and Albert showed
that networks which grow by continuous addition of new nodes
exhibit preferential attachment (likelihood of connecting to a node
depends on the nodes degree) power-law distribution of vertex
degree Internet, WWW, Gnutella
- Slide 48
- Resilience to Failures Power-law networks (Cohen et al.): very
resilient in face of random node failures a giant spanning cluster
still exists fairly resilient in face of cascading failures very
vulnerable in face of orchestrated attacks (towards high-degree
nodes)
- Slide 49
- Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular
sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0
- Slide 50
- 30% random failures 1771 471 294 hostsFri Feb 16
05:21:52-05:23:22 PST
- Slide 51
- 4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST1771 -
63 hosts
- Slide 52
- Discussion Heterogeneity: 3 orders of magnitude of bandwidth
50Kbps-100Mbps 6 orders of magnitude of latency 10us-10s >4
orders of magnitude in availability 1%-99.99% Peers should not be
treated as equals
- Slide 53
- Cooperating, Well-Behaved Peers Incentive: game-theoretic
approaches of enforcing local behavior for global benefit System
enforcement: peers can: measure each others characteristics
(SProbe) enforce the reported ones a reported 56Kbps peer should
not download content at higher speed
- Slide 54
- Feedback to Current Proposals CAN, Chord, Past: great memory
and lookup algorithms: log(N) time and space at the price of
maintaining rigid network structure: hypercubes, butterflies,
Plaxton trees unclear how network structure is maintained given
heterogeneity and dynamics of peers Conjecture these networks will
have a hard time stabilizing: will need lots of routine,
maintenance traffic
- Slide 55
- Instead Gnutella Easy join procedure: this simplicity gave
Gnutella its power-law shape Easy to implement protocol (broadcast)
Lots of maintenance traffic already although the protocol has
become smarter with its subsequent versions Searching is a
nightmare
- Slide 56
- Document Popularity Follows Zipf distribution long-tailed
Popular documents become more popular with Napster/Gnutella
Currently, need to resubmit queries in the hope that someone will
answer Wish-list based system
- Slide 57
- Wide-area Network Measurements Sending a few packets can be
identified with hostile behavior Even a few SYN packets are
sufficient to trigger software firewalls dialogue box pops up
possible scan from washington.edu, click OK or Cancel Many
confused, angry, threatening e-mails sent to many people (security,
root, Ed): active Internet measurements are not simple to
perform
- Slide 58
- Excerpt from e-mail Thank you for your reply. Unfortunately, I
did not authorise anybody from washington.edu to attempt to crack
into my computer. Attempting to break into computers is a crime in
Australia. Please advise the names and contact details of the
people involved in this "research" so that I can contact the
Australian Federal Police, who will no doubt contact your Federal
Bureau of Investigation to investigate this incident and institute
criminal proceedings against those concerned.
- Slide 59
- Current Work Quantify and show that current proposals are too
rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed
exchange system big distributed scheduling problem SGet a
downloading tool with automatic server selection no bandwidth is
wasted
- Slide 60
- Questions? Beautiful Sieg Hall Pride of UW