Post on 19-Dec-2015
Distributed hash tablesProtocols and applications
Jinyang Li
Peer to peer systems
• Symmetric in node functionalities– Anybody can join and leave– Everybody gives and takes
• Great potentials– Huge aggregate network capacity, storage etc.– Resilient to single point of failure and attack
• E.g. Gnapster, Gnutella, Bittorrent, Skype, Joost
Motivations: the lookup problem
• How to locate something given its name?
• Examples:– File sharing– VoIP– Content distribution networks (CDN)
Strawman #1: a central search service
• Central service poses a single point of failure or attack
• Not scalable(?) Central service needs $$$
Improved #1: Hierarchical lookup
• Performs hierarchical lookup (like DNS)• More scalable than a central site• Root of the hierarchy is still a single
point of vulnerability
Strawman #2: Flooding
• Symmetric design: No special servers needed• Too much overhead traffic• Limited in scale• No guarantee for results
Where is item K?
Improved #2: super peers
• E.g. Kazaa, Skype• Lookup only floods super peers• Still no guarantees for lookup results
DHT approach: routing
• Structured: – put(key, value); get(key, value)– maintain routing state for fast lookup
• Symmetric: – all nodes have identical roles
• Many protocols: – Chord, Kademlia, Pastry, Tapestry, CAN,
Viceroy
DHT ideas• Scalable routing needs a notion of “closeness”• Ring structure [Chord]
Different notions of “closeness”
• Tree-like structure, [Pastry, Kademlia, Tapestry]
01100101
0110011101100101 0110011001100100
011011**011001** 011010**011000**
Distance measured as the length of longest matching prefix with the lookup key
Different notions of “closeness”
• Cartesian space [CAN]
0,0.5, 0.5,1 0.5,0.5, 1,1
0,0, 0.5,0.5
0.5,0.25,0.75,0.5
0.5,0,0.75,0.25
0.75,0, 1,0.5
(0,0)
(1,1)
Distance measured as geometric distance
DHT: complete protocol design
1. Name nodes and data items2. Define data item to node assignment3. Define per-node routing table contents4. Handle churn
• How do new nodes join?• How do nodes handle others’ failures?
Chord: Naming
• Each node has a unique flat ID– E.g. SHA-1(IP address)
• Each data item has a unique flat ID (key)– E.g. SHA-1(data content)
Data to node assignment• Consistent hashing
Predecessor is “closest” to
Successor isresponsiblefor
Key-based lookup (routing)
• Correctness– Each node must know its correct successor
Key-based lookup• Fast lookup
– A node has O(log n) fingers exponentially “far” away
Node x’s i-th finger is at x+2i away
Why is lookup fast?
Handling churn: join
• New node w joins an existing network– Issue a lookup to find its successor x– Set its successor: w.succ = x– Set its predecessor: w.pred = x.pred– Obtain subset of responsible items from x– Notify predecessor: x.pred = w
Handling churn: stabilization
• Each node periodically fixes its state– Node w asks its successor x for x.pred– If x.pred is “closer” to itself, set
w.succ = x.pred
Starting with any connected graph, stabilization eventually makes all nodes find correct successors
Handling churn: failures
• Nodes leave without notice– Each node keeps s (e.g. 16)
successors– Replicate keys on s successors
Handling churn: fixing fingers
• Much easier to maintain fingers– Any node at [2i, 2i+1] distance away will do– Geographically closer fingers --> lower
latency– Periodically flush dead fingers
Using DHTs in real systems
• Amazon’s Dynamo key-value storage [SOSP’07]
• Serve as persistent state for applications– shopping carts, user preferences
• How does it apply DHT ideas?– Assign key-value pairs to responsible nodes– Automatic recovery from churn
• New challenges?– Manage read-write data consistency
Using DHTs in real systems
• Keyword-based searches for file sharing – eMule, Azureus
• How to apply a DHT?– User has file 1f3d… with name jingle bell Britney– Insert mappings: SHA-1(jingle)1f3d, SHA-1(bell)
1f3d, SHA-1(britney) 1f3d– How to answer query “jingle bell”, “Britney”?
• Challenges?– Some keywords are much more popular than others– RIAA inserts a billion “Britney spear xyz”crap?
Using DHTs in real systems CoralCDN
Motivation: alleviating flash crowd
OriginServer
Browser
Browser
Browser
Browser
BrowserBrowser
Browser
Browser
http proxy
http proxy
http proxy
http proxy
http proxy
• Proxies handle most client requests
• Proxies cooperate to fetch content from each other
Getting content with CoralCDN
OriginServer
1.Server selectionWhat CDN node
should I use?
• Clients use CoralCDN via modified domain name
nytimes.com/file → nytimes.com.nyud.net/file
2.Lookup(URL)What nodes are
caching the URL?
3.Content transmissionFrom which caching nodes
should I download file?
Coral design goals
• Don’t control data placement– Nodes cache based on access patterns
• Reduce load at origin server– Serve files from cached copies whenever
possible
• Low end-to-end latency– Lookup and cache download optimize for locality
• Scalable and robust lookups
• Given a URL: – Where is the data cached?
– Map name to location: URL {IP1, IP2, IP3, IP4}
–lookup(URL) Get IPs of nearby
caching nodes
–insert(URL,myIP) Add me as caching URL
Lookup for cache locations
,TTL)
for TTL seconds
Isn’t this what a distributed hash table is for?
URL1={IP1,IP2,IP3,IP4}
SHA-1(URL1)
insert(URL1,myIP)
A straightforward use of DHT
• Problems– No load balance for a single URL– All insert and lookup go to the same node
(cannot be close to everyone)
#1 Solve load balance problem: relax hash table semantics
• DHTs designed for hash-table semantics– Insert and replace: URL IPlast
– Insert and append: URL {IP1, IP2, IP3, IP4}
• Each Coral lookup needs only few values– lookup(URL) {IP2, IP4}
– Preferably ones close in network
Prevent hotspots in index
1 2 3# hops:
• Route convergence– O(b) nodes are 1 hop from root
– O(b2) nodes are 2 hops away from root …
Leaf nodes (distant IDs)Root node
(closest ID)
Prevent hotspots in index1 2 3# hops:
• Request load increases exponentially towards root
URL={IP1,IP2,IP3,IP4}
Root node (closest ID)
Leaf nodes (distant IDs)
Rate-limiting requests
1 2 3# hops:
• Refuse to cache if already have max # “fresh” IPs / URL, – Locations of popular items pushed down tree
Root node (closest ID)
URL={IP1,IP2,IP3,IP4}
Leaf nodes (distant IDs)
URL={IP3,IP4}
URL={IP5}
Rate-limiting requests1 2 3# hops:
Root node (closest ID)
URL={IP1,IP2,IP6,IP4}
Leaf nodes (distant IDs)
• Refuse to cache if already have max # “fresh” IPs / URL, – Locations of popular items pushed down tree
• Except, nodes leak through at most β inserts / min / URL– Bound rate of inserts towards root, yet root stays fresh
URL={IP3,IP4}
URL={IP5}
Load balance results
• Aggregate request rate: ~12 million / min• Rate-limit per node (β): 12 / min• Root has fan-in from 7 others
494 nodes on PlanetLab
3 β
2 β
1 β
7 β
#2 Solve lookup locality problem
• Cluster nodes hierarchically based on RTT
• Lookup traverses up hierarchy– Route to “closest” node at each level
Preserve locality through hierarchy
000… 111…Distance to key
None
< 60 ms
< 20 ms
Thresholds
• Minimizes lookup latency • Prefer values stored by nodes within faster clusters
Clustering reduces lookup latency
Reduces median latby factor of 4
Putting all together: Coral reduces server load
Local disk caches begin to handle most requests
Most hits in20-ms Coral
cluster
Few hits to origin
400+ nodes provide 32 Mbps100x capacity of origin