Topics in Reliable Distributed Systems 048961 Lecture 2, Fall 2004-2005 Dr. Idit Keidar.

Topics in Reliable Distributed Systems

048961

Lecture 2, Fall 2004-2005

Dr. Idit Keidar

Introduction to Peer-to-Peer (P2P) Lookup Systems

What Does Peer-to-Peer Mean?

Characterization from CFP of IPTPS 2004: – decentralized, – self-organizing – distributed systems, – in which all or most communication is

symmetric.

Typical Characteristics

• Lots of nodes (e.g., millions).• Dynamic: frequent join, leave, failure.• Little or no infrastructure.

– No central server.

• Communication possible between every pair of nodes (cf. the Internet).

• All nodes are “peers” – have same role; don’t have lots of resources.

The Main Challenge

To design and implement a robust and scalable distributed system

composed of inexpensive, individually unreliable

computers in unrelated administrative domains.

“Looking up Data in P2P Systems”, Balakrishnan et al., CACM Feb 2003.

It All Started with Lookup

• Goal: Make billions of objects available to millions of concurrent users– e.g., music files

• Need a mechanism to keep track of them– map files to their locations

• First There was Napster– centralized server/database – pros and cons?

Traditional Scalability Solution

• Hierarchy– tree overlay: organize nodes into a spanning

tree; communicate on links of the tree– structured lookup: know where to forward the

query next– e.g., DNS.

• Pros and cons?

Overlay Networks

• A virtual structure imposed over the physical network (e.g., the Internet)– over the Internet, there is a (IP level) unicast

channel between every pair of hosts– an overlay uses a fixed subset of these– nodes that have the capability to communicate

directly with each other do not use it

• What is this good for?

Symmetric Lookup Algorithms

• All nodes were created equal

• No hierarchy– overlay not a tree

• So how does the search go?– depends….

Searching in Overlay Networks Take I: Gnutella

• Build a decentralized unstructured overlay– each node has several neighbors;– holds several keys in its local database

• When asked to find a key X– check local database if X is known– if yes, return, if not, ask your neighbors

• What is the communication pattern?

How Come It Works?

• Search is fast– what people care about

• People don’t care so much about wasting bandwidth– may change as ISPs start charging for bandwidth

• Scalability is limited – normally no more than ~40,000 peers

• Files are replicated many times so flooding with a small TTL usually finds the file– even if there are multiple connected components

Take II: FastTrack, KaZaA, eDonkey

• Improve scalability by re-introducing a hierarchy– though not a tree– super-peers have more resources, more

neighbors, know more keys– search goes through super-peers

• Pros and Cons?

Distributed Hash Tables (DHTs)

• Nodes store table entries

• Good abstraction for lookup? Why?

• Requirements for an application being able to use DHTs?– data identified with unique keys– nodes can (agree to) store keys for each other

• location of object or actual object.

The DHT Service Interface

lookup( key )

returns the location of the node currently responsible for this key

key usually numeric (in some range)

Using the DHT Interface

• How do you publish a file?

• How do you find a file?

What Does a DHT Implementation Need to Do?

• Map keys to nodes– needs to be dynamic as nodes join and leave– how does this affect the service interface?

• Route a request to the appropriate node– routing on the overlay

Lookup Example

insert(K1,V1)

K V(K1,V1)

lookup(K1)

Mapping Keys to Nodes

• Goal: load balancing– why?

• Typical approach: – give an m-bit identifier to each node and each

key (e.g., using SHA-1 on the key, IP address)– map key to node whose id is “close” to the key

(need distance function). – how is load balancing achieved?

Routing Issues

• Each node must be able to forward each lookup query to a node closer to the destination

• Maintain routing tables adaptively– each node knows some other nodes– must adapt to changes (joins, leaves, failures)– goals?

Handling Join/Leave

• When a node joins it needs to assume responsibility for some keys – ask the application to move these keys to it– how many keys will need to be moved?

• When a nodes fails or leaves, its keys have to be moved to others– what else is needed in order to implement this?

Stoica, Morris, Karger, Kaashoek, and Balakrishnan

Chord Logical Structure

• m-bit ID space (2m IDs), usually m=160.• Think of nodes as organized in a logical ring

according to their IDs.N1

N30N38

N51N56

Assigning Keys to Nodes

• Key k is assigned to first node whose ID equals or follows k – successor(k)

N30N38

N51N56

Moving Keys upon Join/Leave

• When a node joins, it becomes responsible for some keys previously assigned to its successor – local change– assuming load is balanced, how many keys

should move?

• And what happens when a node leaves?

Simple Routing Solutions

• Each node knows only its successor. – routing around the circle

• Each node knows all other nodes– O(1) routing– cost?

Chord Skiplist Routing

• Each node has “fingers” to nodes ½ way around the ID space from it, ¼ the way…

• finger[i] at n contains successor(n+2i-1)• successor is finger[1]

N30N38

N51N56

How many fingers in the finger table?

Chord Data Structures (At Each Node)

• Finger table

• First finger is successor

• Predecessor

Forwarding Queries

• Query for key k is forwarded to finger with highest ID not exceeding k

K54 Lookup( K54 )N0

N30N38

N51N56

How long does it take?

Remote Procedure Call (RPC)

Routing Time

• Node n looks up a key stored at node p• p is in n’s ith interval

p ((n+2i-1)mod 2m, (n+2i)mod 2m]

• n contacts f=finger[i]– RPC closest_preceding_node

• f is at least 2i-1 away from n• p is at most 2i-1 away from f• The distance is halved

Joining Chord

• Goals?• Steps:

– Find your successor– Initialize finger table and predecessor– Notify other nodes that need to change their

finger table and predecessor pointer• O(log2n)

– Learn the keys that you are responsible for; notify others that you assume control over them

Join Algorithm: Take II

• Observation: for correctness, successors suffice – fingers only needed for performance

• Upon join, update successor only• Periodically,

– check that successors and predecessors are consistent

– fix fingers

Failure Handling

• Periodically fixing fingers • List of r successors instead of one successor• Periodically probing predecessors:

The Model?

• Failures can be accurately detected!• Properties hold as long as failure is bounded:

– If we use a successor list of length r = (logN) in a network that is initially stable, and then every node fails with probability 1/2, then with high probability find successor returns the closest living successor to the query key.

– In a network that is initially stable, if every node then fails with probability 1/2, then the expected time to execute find successor is O(logN).

What About Moving Keys?

• Left up to the application

• Solution: keep soft state, refreshed periodically– every refresh operation performs lookup(key)

before storing the key in the right place

• How can we increase reliability for the time between failure and refresh?

Topics in Reliable Distributed Systems 048961 Lecture 2, Fall 2004-2005 Dr. Idit Keidar.

Documents

Transcript of Topics in Reliable Distributed Systems 048961 Lecture 2, Fall 2004-2005 Dr. Idit Keidar.

© Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

Gal Badishi, Idit Keidar, Amir Sasson

November 19 2004 - NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.

1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.

Project Presentation Students: Yan Michalevsky Asaf Cidon Supervisors: Alexander Shraer Assoc. Prof. Idit Keidar.

Idit Keidar and Dmitri Perelman Technion 1 SPAA 2009.

Optimistic Virtual Sync hron ygroups.csail.mit.edu/tds/papers/Keidar/ovs-tr.pdf · Virtual Sync hron y Jerem y Sussman y Idit Keidar Keith Marzullo z Abstract Group comm unication

How to Choose a Timing Model? Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Asynchronous Byzantine Agreement with Subquadratic ......[CKS20]: Shir Cohen, Idit Keidar, and Alexander Spiegelman. Not a COINcidence: Sub-quadratic asynchronous Byzantine agreement

1 Venus: Verification for Untrusted Cloud Storage Christian Cachin Idit Keidar, Asaf Cidon, Yan Michalevsky, Dani Shaket IBM Research Zurch Technion, Israel.

CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute.

Idit Keidar, Topics in Reliable Distributed Systems, Technion EE, Winter 2004-2005 1 Topics in Reliable Distributed Systems 048961 Winter 2004-2005 Dr.

1 EquiCast Scalable Multicast with Selfish Users Idit Keidar, Roie Melamed, Ariel Orda Technion.

1 Principles of Reliable Distributed Systems Lecture 10: Atomic Shared Memory Objects and Shared Memory Emulations Spring 2007 Prof. Idit Keidar.

1 Principles of Reliable Distributed Systems Lecture 11: Disk Paxos and Quorum Systems Spring 2007 Prof. Idit Keidar.

Transactional Data Structure Librarieswebee.technion.ac.il/people/idish/ftp/Transactional... · 2016. 6. 11. · Idit Keidar Technion and Yahoo Research, Israel idish@ee.technion.ac.il

Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

Local Computations in Large-Scale Networks Idit Keidar Technion.

Scaling Concurrent Log-Structured Data Stores · 2020. 4. 26. · Idit Keidar Technion, Yahoo Labs Haifa, Israel idish@ee.technion.ac.il Abstract Log-structured data stores (LSM-DSs)

1 Principles of Reliable Distributed Systems Lecture 12: Disk Paxos and Quorum Systems Spring 2009 Idit Keidar.