Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines...

Post on 24-Dec-2015

213 views 0 download

Transcript of Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines...

Introduction to Peer-to-Peer Networks

What is a P2P network

• Uses the vast resource of the machines at the edge

of the Internet to build a network that allows resource

sharing without any central authority.

• Client-Server vs. Peer-to-peer. A peer is both a

client and a server. Control is decentralized.

• Much more than a system for sharing pirated

music.

Historical Perspective

• The Internet originally emphasized working in the P2P

mode instead of the client-server mode.

• SRI, UCLA, UCSB and University of Utah had powerful

host machines forming a league of equals. ARPANET

arranged to integrate them in the late 1960’s.

Historical Perspective

• USENET was originally based on UUCP (Unix-to-

Unix Copy Protocol). It allowed users on two different

Unix machines to exchange messages and files.

Why does P2P need attention?

Overlay network

A P2P network is an overlay network. Each link

between peers consists of one or more IP links.

Alice Bob

Carol

Well-known P2P Systems

• Napster

• Gnutella

• KaZaA

• Limewire

• eDonkey

• Chord

• Tapestry

• CAN

• Pastry

• BitTorrent

• Kademlia

• Skype

• Various Social networks

Some important issues

Search

Storage

Security

Applications

A Distributed Storage Service

Alice Bob

Carol David

Promises

Consider File Sharing as an Example

– Available 24/7

– Durable despite machine failures

– Information is protected

– Resilient to Denial of Service

Additional Goals

• Massive scalability

• Anonymity

• Deniability

• Resistance to censorship

Challenges

• A P2P network must be self-organizing. Join

and leave operations must be self-managed.

• The infrastructure is untrusted and the

components are unreliable. The number of faulty

nodes grows linearly with system size. Yet, the

aggregate behavior has to be trustworthy.

Challenges

• Tolerance to failures and churn

• Efficient routing even if the structure of the

network is unpredictable.

• Dealing with freeriders

• Load balancing

• Security issues

Looking up data

• How do you locate data/files/objects in a large P2P

system built around a dynamic set of nodes in a

scalable manner without any centralized server or

hierarchy?

• Napster index servers used a central database.

Questionable scalability and poor resilience.

• Check how names are looked up in internet’s DNS.

Napster

Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck..

Root/Redirector

Directoryserver

Directoryserver

Directoryserver

Users

INTERNET

Stores indices of songs only

Gnutella

Truly decentralized system. A search like

where is Double Helix?

is based on the flooding of the query on a graph of

arbitrary topology. Obvious scalability problem, and

the wastage of bandwidth caused serious

inefficiencies.

Gnutella graph

Client looking

for “double helix”

double helix

Unstructured vs. Structured

• Unstructured P2P networks allow resources

to be placed at any node. The network

topology is arbitrary, and the growth is

spontaneous.

• Structured P2P networks simplify resource

location and load balancing by defining a

topology and defining rules for resource

placement.

Distributed Hash Table (DHT)

Object-to-machine mapping uses unique keys.

H (object name) = key (H = hash function)

H (machine name) = key

Object name mapped to key k is placed in machine whose

name is mapped to key k.

Simplifies object location.

Distributed Hash Table (DHT)

keyspace

a

c

b

0N-1

Machine namehashed to b

Object namehashed to b

Basic idea