Distributed hash tables Protocols and applications Jinyang Li.

Distributed hash tablesProtocols and applications

Jinyang Li

Peer to peer systems

• Symmetric in node functionalities– Anybody can join and leave– Everybody gives and takes

• Great potentials– Huge aggregate network capacity, storage etc.– Resilient to single point of failure and attack

• E.g. Gnapster, Gnutella, Bittorrent, Skype, Joost

Motivations: the lookup problem

• How to locate something given its name?

• Examples:– File sharing– VoIP– Content distribution networks (CDN)

Strawman #1: a central search service

• Central service poses a single point of failure or attack

• Not scalable(?) Central service needs $$$

Improved #1: Hierarchical lookup

• Performs hierarchical lookup (like DNS)• More scalable than a central site• Root of the hierarchy is still a single

point of vulnerability

Strawman #2: Flooding

• Symmetric design: No special servers needed• Too much overhead traffic• Limited in scale• No guarantee for results

Where is item K?

Improved #2: super peers

• E.g. Kazaa, Skype• Lookup only floods super peers• Still no guarantees for lookup results

DHT approach: routing

• Structured: – put(key, value); get(key, value)– maintain routing state for fast lookup

• Symmetric: – all nodes have identical roles

• Many protocols: – Chord, Kademlia, Pastry, Tapestry, CAN,

Viceroy

DHT ideas• Scalable routing needs a notion of “closeness”• Ring structure [Chord]

Different notions of “closeness”

• Tree-like structure, [Pastry, Kademlia, Tapestry]

01100101

0110011101100101 0110011001100100

011011**011001** 011010**011000**

Distance measured as the length of longest matching prefix with the lookup key

Different notions of “closeness”

• Cartesian space [CAN]

0,0.5, 0.5,1 0.5,0.5, 1,1

0,0, 0.5,0.5

0.5,0.25,0.75,0.5

0.5,0,0.75,0.25

0.75,0, 1,0.5

Distance measured as geometric distance

DHT: complete protocol design

1. Name nodes and data items2. Define data item to node assignment3. Define per-node routing table contents4. Handle churn

• How do new nodes join?• How do nodes handle others’ failures?

Chord: Naming

• Each node has a unique flat ID– E.g. SHA-1(IP address)

• Each data item has a unique flat ID (key)– E.g. SHA-1(data content)

Data to node assignment• Consistent hashing

Predecessor is “closest” to

Successor isresponsiblefor

Key-based lookup (routing)

• Correctness– Each node must know its correct successor

Key-based lookup• Fast lookup

– A node has O(log n) fingers exponentially “far” away

Node x’s i-th finger is at x+2i away

Why is lookup fast?

Handling churn: join

• New node w joins an existing network– Issue a lookup to find its successor x– Set its successor: w.succ = x– Set its predecessor: w.pred = x.pred– Obtain subset of responsible items from x– Notify predecessor: x.pred = w

Handling churn: stabilization

• Each node periodically fixes its state– Node w asks its successor x for x.pred– If x.pred is “closer” to itself, set

w.succ = x.pred

Starting with any connected graph, stabilization eventually makes all nodes find correct successors

Handling churn: failures

• Nodes leave without notice– Each node keeps s (e.g. 16)

successors– Replicate keys on s successors

Handling churn: fixing fingers

• Much easier to maintain fingers– Any node at [2i, 2i+1] distance away will do– Geographically closer fingers --> lower

latency– Periodically flush dead fingers

Using DHTs in real systems

• Amazon’s Dynamo key-value storage [SOSP’07]

• Serve as persistent state for applications– shopping carts, user preferences

• How does it apply DHT ideas?– Assign key-value pairs to responsible nodes– Automatic recovery from churn

• New challenges?– Manage read-write data consistency

Using DHTs in real systems

• Keyword-based searches for file sharing – eMule, Azureus

• How to apply a DHT?– User has file 1f3d… with name jingle bell Britney– Insert mappings: SHA-1(jingle)1f3d, SHA-1(bell)

1f3d, SHA-1(britney) 1f3d– How to answer query “jingle bell”, “Britney”?

• Challenges?– Some keywords are much more popular than others– RIAA inserts a billion “Britney spear xyz”crap?

Using DHTs in real systems CoralCDN

Motivation: alleviating flash crowd

OriginServer

Browser

BrowserBrowser

Browser

http proxy

• Proxies handle most client requests

• Proxies cooperate to fetch content from each other

Getting content with CoralCDN

OriginServer

1.Server selectionWhat CDN node

should I use?

• Clients use CoralCDN via modified domain name

nytimes.com/file → nytimes.com.nyud.net/file

2.Lookup(URL)What nodes are

caching the URL?

3.Content transmissionFrom which caching nodes

should I download file?

Coral design goals

• Don’t control data placement– Nodes cache based on access patterns

• Reduce load at origin server– Serve files from cached copies whenever

possible

• Low end-to-end latency– Lookup and cache download optimize for locality

• Scalable and robust lookups

• Given a URL: – Where is the data cached?

– Map name to location: URL {IP1, IP2, IP3, IP4}

–lookup(URL) Get IPs of nearby

caching nodes

–insert(URL,myIP) Add me as caching URL

Lookup for cache locations

for TTL seconds

Isn’t this what a distributed hash table is for?

URL1={IP1,IP2,IP3,IP4}

SHA-1(URL1)

insert(URL1,myIP)

A straightforward use of DHT

• Problems– No load balance for a single URL– All insert and lookup go to the same node

(cannot be close to everyone)

#1 Solve load balance problem: relax hash table semantics

• DHTs designed for hash-table semantics– Insert and replace: URL IPlast

– Insert and append: URL {IP1, IP2, IP3, IP4}

• Each Coral lookup needs only few values– lookup(URL) {IP2, IP4}

– Preferably ones close in network

Prevent hotspots in index

1 2 3# hops:

• Route convergence– O(b) nodes are 1 hop from root

– O(b2) nodes are 2 hops away from root …

Leaf nodes (distant IDs)Root node

(closest ID)

Prevent hotspots in index1 2 3# hops:

• Request load increases exponentially towards root

URL={IP1,IP2,IP3,IP4}

Root node (closest ID)

Leaf nodes (distant IDs)

Rate-limiting requests

1 2 3# hops:

• Refuse to cache if already have max # “fresh” IPs / URL, – Locations of popular items pushed down tree

URL={IP3,IP4}

URL={IP5}

Rate-limiting requests1 2 3# hops:

• Refuse to cache if already have max # “fresh” IPs / URL, – Locations of popular items pushed down tree

• Except, nodes leak through at most β inserts / min / URL– Bound rate of inserts towards root, yet root stays fresh

URL={IP3,IP4}

URL={IP5}

Load balance results

• Aggregate request rate: ~12 million / min• Rate-limit per node (β): 12 / min• Root has fan-in from 7 others

494 nodes on PlanetLab

#2 Solve lookup locality problem

• Cluster nodes hierarchically based on RTT

• Lookup traverses up hierarchy– Route to “closest” node at each level

Preserve locality through hierarchy

000… 111…Distance to key

< 60 ms

< 20 ms

Thresholds

• Minimizes lookup latency • Prefer values stored by nodes within faster clusters

Clustering reduces lookup latency

Reduces median latby factor of 4

Putting all together: Coral reduces server load

Local disk caches begin to handle most requests

Most hits in20-ms Coral

cluster

Few hits to origin

400+ nodes provide 32 Mbps100x capacity of origin

Distributed hash tables Protocols and applications Jinyang Li.

Documents

Transcript of Distributed hash tables Protocols and applications Jinyang Li.

Cope with selfish and malicious nodes Jinyang Li.

Deng Jinyang CV-updated 2018 - West Virginia University · 1 Jinyang Deng Associate Professor, Ph.D. 325 B Percival Hall Recreation, Parks, and Tourism Resources Program, Division

Tweakable Hash Functions in Stateless Hash-Based Signature ...

1 Hashing Techniques: Implementation Implementing Hash Functions Implementing Hash Tables Implementing Chained Hash Tables Implementing Open Hash Tables.

Hash-Based Indexingadrem.uantwerpen.be/sites/default/files/db2-hash-indexes.pdfHash-Based Indexing Hash-Based Indexing Static Hashing Hash Functions Extendible Hashing Search Insertion

Authentication Building Secure Protocols. Topics The Authentication Problem Simple Device Authentication Attack and Countermeasures Cryptographic Hash.

COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.

Hash-JoinAlgorithmen · Hash-Join Algorithmen | Matthias Richly | 6. Januar 2009 3 Gliederung Hash-JoinAlgorithmen allgemein Einfacher Hash-Join GRACE Hash-Join Hybrid Hash-Join

Chapter 3 Public Key Cryptography · 30 One way datadata hash value hash value. 31 Collision resistant datadata hash function hash value hash value datadata. 32 Message Authentication

A Security and Performance Evaluation of Hash-based RFID Protocols Tong Lee Lim, Tieyan Li & Yingjiu Li Cryptography and Security Department Institute.

Fault-tolerance techniques RSM, Paxos Jinyang Li.

Dynamics ofof Distributed Hash Distributed Hash …balke/lecture-p2p/Vorlesung_5.pdfL3S Research Center, University of Hannover Dynamics ofof Distributed Hash Distributed Hash Tables

Outline Introduction Related Work PUF-Based Tag Identification Algorithm PUF-Based MAC Protocols PUF Vs. Digital Hash Functions Building PUFs.

Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.

hash tables - cs.sfu.ca · A Good Hash Function Independent hash function: Express the key as an integer (if it isn’t already one), called hash value or hash code . When doing so

Multi-Hash: A Family of Cryptographic Hash Algorithm ... · Multi-Hash: A Family of Cryptographic Hash Algorithm Extensions 5 Definition of Multi-Hash We define Multi-Hash extensions

Hash Tables - ics.uci.edugoodrich/teach/cs260P/notes/HashTables.… · Hash Tables 5 Hash Functions and Hash Tables q A hash function h maps keys of a given type to integers in a

Hashing & Hash TablesHashing & Hash Tablesananth/CptS223/Lectures/hashing.pdf · Hash table: Main components key value Hash index e “john” key h(“john”) TableSiz Hash function

Jinyang Zheng - HySafe€¦ · Gaps in High Pressure Hydrogen Storage. Jinyang Zheng. Changjiang Scholar, Zhejiang University. Chair, SAC/TC31/SC8 Gas Cylinders/High Pressure Vehicle

PC hardware and x86 programming Lec 2 Jinyang Li.