Computer Communications · distributed • Cons: – Search scope is O(N) – Search time is O(???)...
Transcript of Computer Communications · distributed • Cons: – Search scope is O(N) – Search time is O(???)...
Computer Communications
Peer to Peer networking
Ack: Many of the slides are adaptations of slides by authors in the bibliography section.
p2p• Quickly grown in popularity
– numerous sharing applications– many million people worldwide use P2P networks
• But what is P2P in the Internet?– Searching or location? – Computers “Peering”?
– Take advantage of resources at the edges of the network
• End‐host resources have increased dramatically• Broadband connectivity now common
P2P 2
Lecture outline
• Evolution of p2p networking – seen through file‐sharing applications
• Other applications
P2P 3
P2P Networks: file sharing
• Common Primitives:– Join: how do I begin participating?– Publish: how do I advertise my file?– Search: how to I find a file/service?– Fetch: how to I retrieve a file/use service?
P2P 4
First generation in p2p file sharing/lookup
• Centralized Database: single directory– Napster
• Query Flooding– Gnutella
• Hierarchical Query Flooding– KaZaA
• (Further unstructured Overlay Routing– Freenet, …)
• Structured Overlays– …
P2P 5
P2P: centralized directoryoriginal “Napster” design (1999, S.
Fanning)1) when peer connects, it informs
central server:– IP address, content
2) Alice queries directory server for “Boulevard of Broken Dreams”
3) Alice requests file from Bob
P2P 6
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
Napster: Publish
P2P 7
I have X, Y, and Z!
Publish
insert(X,123.2.21.23)
...
123.2.21.23
Napster: Search
P2P 8
Where is file A?
Query Reply
search(A)-->123.2.0.18Fetch
123.2.0.18
First generation in p2p file sharing/lookup
• Centralized Database– Napster
• Query Flooding: no directory– Gnutella
• Hierarchical Query Flooding– KaZaA
• (Further unstructured Overlay Routing– Freenet)
• Structured Overlays– …
P2P 9
Gnutella: Overview
• Query Flooding:– Join: on startup, client contacts a few other nodes (learn from bootstrap‐node); these become its “neighbors”
– Publish: no need
– Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender.
– Fetch: get the file directly from peer
P2P 10
Gnutella: Search
P2P 11
I have file A.
I have file A.
Where is file A?
Query
Reply
Gnutella: protocol
P2P 12
Query
QueryHit
Query
QueryHit
File transfer:HTTP• Query message
sent over existing TCPconnections• peers forwardQuery message• QueryHit sent over reversepath
Scalability:limited scopeflooding
Napsetr vs Gnutella: Discussion +, ‐?
• Pros:– Simple – Fully de‐centralized– Search cost distributed
• Cons:– Search scope is O(N)– Search time is O(???)
P2P 13
Pros: Simple Search scope is O(1)
Cons: Server maintains O(N)
State Server performance
bottleneck Single point of failure
Gnutella
Interesting concept in practice: overlay network:
active gnutella peers and edges form an overlay
• A network on top of another network:– Edge is not a physical link (what is it then?)
P2P 14
First generation in p2p file sharing/lookup
• Centralized Database– Napster
• Query Flooding– Gnutella
• Hierarchical Query Flooding: some directories– KaZaA
• Further unstructured Overlay Routing– Freenet
• …
P2P 15
KaZaA: Overview• “Smart” Query Flooding:
– Join: on startup, client contacts a “supernode” ... may at some point become one itself
– Publish: send list of files to supernode– Search: send query to supernode, supernodes flood query amongst
themselves.– Fetch: get the file directly from peer(s); can fetch simultaneously from
multiple peers
P2P 16
KaZaA: File Insert
P2P 17
I have X!
Publish
insert(X,123.2.21.23)
...
123.2.21.23
“Super Nodes”
KaZaA: File Search
P2P 18
Where is file A?
Query
search(A)-->123.2.0.18
search(A)-->123.2.22.50
Replies
123.2.0.18
123.2.22.50
“Super Nodes”
KaZaA: Discussion• Pros:
– Tries to balance between search overhead and space needs– Tries to take into account node heterogeneity:
• Bandwidth• Host Computational Resources
– Rumored to take into account network locality
• Cons:– Still no real guarantees on search scope or search time
• P2P architecture used by Skype, Joost (communication, video distribution p2p systems)
P2P 19
First steps in p2p file sharing/lookup
• Centralized Database– Napster
• Query Flooding– Gnutella
• Hierarchical Query Flooding– KaZaA
• Further unstructured Overlay Routing– Freenet: some directory, cache‐like, based on recently seen targets; see literature
pointers for more
• Structured Overlay Organization and Routing– Distributed Hash Tables– Combine database+distributed system expertise
P2P 20
P2P 21
Problem from this perspective
How to find data in a distributed file sharing system?
(Routing to the data)
Lookup is a key problem
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client ?
P2P 22
Centralized Solution
O(M) state at server, O(1) at clientO(1) search communication overheadSingle point of failure
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
DB
Central server (Napster)
P2P 23
Distributed Solution
O(1) state per node
Worst case O(E) messages per lookup
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
Flooding (Gnutella, etc.)
P2P 24
Distributed Solution (some more structure? In-between the two?)
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
balance the update/lookup complexity..Abstraction: a distributed “hash-table” (DHT) data structure:
put(id, item);item = get(id);
Implementation: nodes form a distributed data structure
eg. Ring, Tree, Hypercube, SkipList, Butterfly.
Hash function maps entries to nodes; usingthe node structure, find the node responsiblefor item; that one knows where the item is
- >
P2P 25
Hash function maps entries tonodes; using the node structure, findthe node responsible for item; thatone knows where the item is
Challenges:•Keep the hop count small• Keep the routing tables “right size”• Stay robust despite rapid changes in membership
figure source: wikipedia
I do not know DFCD3454but should ask my
right-hand neighbour
DHT: Comments/observations?
– think about structure maintenance/benefits
P2P 26
Next generation in p2p netwoking
• Swarming– BitTorrent, Avalanche, …
• …
P2P 27
BitTorrent: Next generation fetching
• In 2002, B. Cohen debuted BitTorrent• Key Motivation:
– Popularity exhibits temporal locality (Flash Crowds)
• Focused on Efficient Fetching, not Searching:– Distribute the same file to groups of peers– Single publisher, multiple downloaders
• Used by publishers to distribute software, other large files
• http://vimeo.com/15228767
P2P 28
BitTorrent: Overview
• Swarming:– Join: contact centralized “tracker” server, get a list of peers.
– Publish: can run a tracker server.– Search: Out‐of‐band. E.g., use Google, some DHT, etc to find a tracker for the file you want. Get list of peers to contact for assembling the file in chunks
– Fetch: Download chunks of the file from your peers. Upload chunks you have to them.
P2P 29
File distribution: BitTorrent
30
tracker: tracks peers participating in torrent
torrent: group of peers exchanging
chunks of a file
obtain listof peers
trading chunks
peer
P2P file distribution
BitTorrent (1)• file divided into chunks.• peer joining torrent:
– has no chunks, but will accumulate them over time– registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
• while downloading, peer uploads chunks to other peers. • peers may come and go• once peer has entire file, it may (selfishly) leave or
(altruistically) remain
31
BitTorrent: Tit‐for‐tat
32
(1) Alice “optimistically unchokes” Bob(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading
partners & get file faster!
BitTorrent: Discussion
– Works reasonably well in practice– Gives peers incentive to share resources; tries to avoids freeloaders
P2P 33
Lecture outline
• Evolution of p2p networking – seen through file‐sharing applications
• Other applications
P2P 34
P2P – not only sharing files…Overlay: a network
implemented on top of a network– E.g. Peer‐to‐peer
networks, ”backbones” in adhoc networks, transportaiton network overlays, electricity gridoverlays ...
• Content delivery, software publication
• Streaming media applications
• Distributed computations (volunteer computing)
• Portal systems
• Distributed search engines
• Collaborative platforms
• Communication networks
• Social applications
• Other overlay‐related applications....
Router Overlays for e.g. protection/mitigation offlooding attacks
P2P 36
P2P Case study: Skype
• inherently P2P: pairs of users communicate.
• proprietary application‐layer protocol (inferred via reverse engineering)
• hierarchical overlay with SNs
• Index maps usernames to IP addresses; distributed over SNs
2: Application Layer 37
Skype clients (SC)
Supernode (SN)
Skype login server
Peers as relays• Problem when both Alice
and Bob are behind “NATs”. – NAT prevents an outside peer
from initiating a call to insider peer
• Solution:– Using Alice’s and Bob’s SNs,
Relay is chosen– Each peer initiates session
with relay. – Peers can now communicate
through NATs via relay
2: Application Layer 38
P2P 39
BibliographyCollective sources• Kurose, Ross: Computer Networking, a top‐down approach, chapter on applications, sections peer‐
to‐peer, streaming and multimedia; AdisonWesley 2009• Aberer’s coursenotes
– http://lsirwww.epfl.ch/courses/dis/2007ws/lecture/week%208%20P2P%20systems‐general.pdf– http://lsirwww.epfl.ch/courses/dis/2007ws/lecture/week%209%20Structured%20Overlay%20Networks.pdf
Papers for Further Study• Chord: A Scalable Peer‐to‐peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris,
David Karger, M. Frans Kaashoek, and Hari Balakrishnan. ACM SIGCOMM 2001, San Diego, CA, August 2001, pp. 149‐160.
• Kademlia: A Peer to peer information system Based on the XOR Metric. Petar Maymounkov and David Mazières , 1st International Workshop on Peer‐to‐peer Systems, 2002.
• Pastry: Scalable, distributed object location and routing for large‐scale peer‐to‐peer systems, A. Rowstron and P. Druschel, IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), November 2001.
Bibliography (cont)• A Scalable Content‐Addressable Network, S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker,
Sigcomm 2001, San Diego, CA, USA, August, 2001.• Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous
Information Storage and Retrieval System. Int’l Workshop on Design Issues in Anonymity and Unobservability. LNCS 2009. Springer Verlag 2001.
• Viceroy: A Scalable and Dynamic Emulation of the Butterfly. By D. Malkhi, M. Naor and D. Ratajczak. In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC '02), August 2002. Postscript.
• Incentives build Robustness in BitTorrent, Bram Cohen. Workshop on Economics of Peer‐to‐Peer Systems, 2003.
• Do incentives build robustness in BitTorrent? Michael Piatek, Tomas Isdal, Thomas Anderson, ArvindKrishnamurthy and Arun Venkataramani, NSDI 2007
• Exploiting BitTorrent For Fun (But Not Profit)iptps06.cs.ucsb.edu/talks/Liogkas_BitTorrent.ppt
• J. Mundinger, R. R. Weber and G. Weiss. Optimal Scheduling of Peer‐to‐Peer File Dissemination. Journal of Scheduling, Volume 11, Issue 2, 2008. [arXiv] [JoS]
• Christos Gkantsidis and Pablo Rodriguez, Network Coding for Large Scale Content Distribution, in IEEE INFOCOM, March 2005 (avalanche swarming)
P2P 40
Extra slides/notes
More examples: a story in progress...
New power grids: be adaptive!
• Bidirectional power and information flow– Micro‐producers or “prosumers”, can share resources– Distributed energy resources
• Communication + resource‐administration (distributed system) layer– aka “smart” grid 43
From ”broadcasting” to ”routing” ‐and more
From To
Central generation and control Distributed and central generation and control
Flow by Kirchhoff’s law Flow control (routing) by power electronics
Power generation according to demand Fluctuating generation and demand; need for equilibrium/storage
Manual trouble response Automatic response/islanding, predictive avoidance; advanced monitoring, situational awareness
Security /Robustness needs in the power system New security needs in the information system: operation & administration domains, ”openness” (e.g. misleading sources; trust)
SmartGrid: From ”broadcasting” to ”routing” of power and non‐centralized coordination
New power grids
Natural overlays for microgrids
El‐networks as distributed cyber‐physical systems
Cyber system
El- link and/or communication link
Overlay network
Physical system
Computing+ communicating device
An analogy: layering in computing systems
Why adding “complexity” in the infrastructure?Motivation: enable renewables, better use of el‐power
Course/Masterclass:ICT Support for Adaptiveness and Security in the
Smart Grid (DAT285B)• Goals
– students (from computer science and other disciplines) get introduced to advanced interdisciplinary concepts related to the smart grid, thus
– building an understanding of essential notions in the individual disciplines, and
– investigating a domain‐specific problem relevant to the smart grid that need an understanding beyond the traditional ICT field.
Two instances of DAT285• LP2 = Autonomous and Cooperative Vehicular Systems• LP4 = ICT Support for Adaptiveness and Security in the Smart Grid
Environment
• Based on both the present and future design of the smart grid. – How can techniques from distributed systems be applied to large, heterogeneous systems where amassive amount of data will be collected?
– How can such a system, containing legacy components with no security primitives, be madesecure when the communication is added by interconnecting the systems?
• The students will have access to a hands‐on lab, where they can run and test their design and code.
Course Setup• The course is given on an advanced master’s level, resulting in 7.5 points.
• Group start: Study Period 4, 2012‐13– Can also define individual, “research internship courses”, 15p or MS thesis, starting earlier
• The course structure– first part: lectures to introduce the two disciplines (“crash course”).
– second part: seminar‐style where research papers from both disciplines are actively discussed and presented.
– At the end of the course the students are also expected to present their respective project.
Complementary, optional
For overview knowledge on the ”physical layer” of the power grid:
– Sustainable Power Systems xyzabc, study period 2• Targeted to non EE‐students
From ”broadcasting” to ”routing” ‐and more
From To
Central generation and control Distributed and central generation and control
Flow by Kirchhoff’s law Flow control (routing) by power electronics
Power generation according to demand Fluctuating generation and demand; need for equilibrium/storage
Manual trouble response Automatic response/islanding, predictive avoidance; advanced monitoring, situational awareness
Security /Robustness needs in the power system New security needs in the information system: operation & administration domains, ”openness” (e.g. malicious/misleading sources; trust)
SmartGrid: From ”broadcasting” to ”routing” of power and non‐centralized coordination
Besides,
• A range of projects and possibilities of”internship courses” with the supporting team (faculty and PhD/postdocs)– M. Almgren, O. Landsiedel, M. Papatriantafilou– D. Cederman, Z. Fu, G. Georgiadis, V. Tudor
Example MS/research‐internship projects – draft info
up‐to‐date/more detailed info is/becomes available through
http://www.cse.chalmers.se/research/group/dcs/exjobbs.html
Smart Grid Communication Simulation with ZigBee
Contact: O. Landsiedel, M. Almgren, MP, et‐al
• design and implement (with our support) the ZigBeeprotocol stack on a well‐known network simulator.
• Simulator already provides the underlying wireless communication models such as IEEE 802.15.4.
Smart Meter Software Simulation
build software (with our support) which will simulate the protocol stack in a smart meter
Functions: input/output consumption values (internal registers) Communication (e.g. M‐BUS, Comm over TCP/IP or
ZigBee to be added) Low power consumption (maybe battery powered) Robustmess/modularity
Requirements: Programming in C/Python under Linux Possibly for ARM platform (BeagleBoard,
RaspberryPI)
Contact: V. Tudor, M. Almgren, MP, et‐al
In‐network data aggregation
56Contact: MP, D. Cederman, Z. Fu, et‐al
• computation of functions over data whilecollecting it in a hierarchical/meshnetwork• e.g. average, variance, %, clustering, • for monitoring, detection of alarming
situations, input to resource management, etc
• prototype/experimental part (with oursupport):• tossim/tinyOS (+ visualization
possibilities), OMNET, or possibly ARM platform (BeagleBoard, RaspberryPI)
• possibilities to connect to the projects on smart meter simulation, zigbee networksimulation, adaptive power management
• Possibility to work on an actual microgrid• Programming in C/Python
Simulation of the Smart Grid incl. e‐meter and controls communication
• Implementation/Simulating a network/microgrid (with support)• Matlab + interface to JADE/peersim or similar
environments
• Further study possibilities• Power “routing” among nodes with power
controllers • Balancing/scheduling of tasks in time, to
match availability, cut‐off the peaks• Adaptive Overlays with preferences
Contact: G. Georgiadis, O. Landsiedel, M. Papatriantafilou
Erland Jonsson Philippas TsigasTomas Olovsson Elad Schiller
Asrin Javaheri
Farnaz Moradi Laleh Pirzadeh
Pierre Kleberger
Andreas Larsson
Zhang Fu
The Distributed&Parallel Computing and Computer Security team
Daniel CedermanGiorgos Georgiadis
Dang Nhan Nguyen
4 new PhD studentsummer 2012
Olaf Landsiedel
Bapi Chatterjee
Ioannis Nikolakopoulos
Magnus AlmgrenM. Papatriantafilou
Application domains: energy systems, vehicular systems, communication systems and networks
International masters program on Computer Systems and Networks Among the top 5 at CTH (out
of ~50)
Briefly on the team’s research + education areadistributed
problems over network‐based
systems(e.g. overlays,
distributed, locality‐based resourcemanagement)
Security, reliability, adaptivenessSurvive failures, detect & mitigate attacks, secure self‐organization, …
ParallelprocessingFor efficiecy,
data&computation‐intensive systems, programming new
systems(e.g. multicores)
Notes p2p
Efficiency/scalability throughpeering (collaboration)
P2P 61
File Distribution: Server‐Client vs P2PQuestion : How much time to distribute file from one server to N peers?
62
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidthui: peer i upload bandwidth
di: peer i download bandwidth
File distribution time: server‐client
• server sequentially sends N copies:– NF/us time
• client i takes F/di time to download
63
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F
increases linearly in N(for large N)
= dcs depends on
max { NF/us, F/min(di) }i
Time to distribute Fto N clients using
client/server approach
File distribution time: P2P
• server must send one copy: F/us time
• client i takes F/di time to download
• NF bits must be downloaded (aggregate)
64
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F
r aggregated upload rate: us + ui
dP2P depends on max { F/us, F/min(di) , NF/(us + ui) }i
2: Application Layer 65
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imum
Dis
tribu
tion
Tim
e P2PClient-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
DHT: Overview• Structured Overlay Routing:
– Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id
– Publish: Route publication for file id toward an appropriate node id along the data structure
• Need to think of updates when a node leaves
– Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication.
– Fetch: Two options:• Publication contains actual file => fetch from where query stops• Publication says “I have file X” => query tells you 128.2.1.3 has X, use http or similar
(i.e. rely on IP routing) to get X from 128.2.1.3
P2P 66
new leecher
BitTorrent – joining a torrent
Peers divided into: • seeds: have the entire file• leechers: still downloading
datarequest
peer list
metadata file
join
1
2 3
4seed/leecher
website
tracker
1. obtain the metadata file2. contact the tracker3. obtain a peer list (contains seeds & leechers)4. contact peers from that list for data
!
BitTorrent – exchanging data
I have leecher A
●Verify pieces using hashes●Download sub-pieces in parallel● Advertise received pieces to the entire peer list● Look for the rarest pieces
seed
leecher B
leecher C
BitTorrent ‐ unchoking
leecher A
seed
leecher B
leecher Cleecher D
● Periodically calculate data-receiving rates● Upload to (unchoke) the fastest downloaders● Optimistic unchoking▪ periodically select a peer at random and upload to it▪ continuously look for the fastest partners
BitTorrent (2)Pulling Chunks• at any given time, different
peers have different subsets of file chunks
• periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
• Alice sends requests for her missing chunks– rarest first
70
• Sending Chunks: tit‐for‐tat• Alice sends chunks to (4)
neighbors currently sending her chunks at the highest rate• re‐evaluate top 4 every 10 secs
• every 30 secs: randomly select another peer, starts sending chunks• newly chosen peer may join top 4
• “optimistically unchoke”