sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among...
Transcript of sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among...
![Page 1: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/1.jpg)
Sharding
![Page 2: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/2.jpg)
Scaling Paxos: Shards
We can use Paxos to decide on the order of operations, e.g., to a key-value store
- leader sends each op to all servers
- practical limit on how ops/second
What if we want to scale to more clients?
Sharding among multiple Paxos groups
- partition key-space among groups
- for single key operations, still linearizable
![Page 3: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/3.jpg)
State machine
Paxos
Replicated, Sharded Database
State machine
State machine
Paxos
State machine
Paxos
![Page 4: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/4.jpg)
State machine
Paxos
Replicated, Sharded Database
State machine
State machine
Paxos
State machine
Paxos
Which keys are where?
![Page 5: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/5.jpg)
State machine
Paxos
Lab 4 (and other systems)
State machine
State machine
Paxos
State machine
Paxos
Paxos
Shard master
![Page 6: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/6.jpg)
Replicated, Sharded Database
Shard master decides
- which Paxos group has which keys
Shards operate independently
How do clients know who has what keys?
- Ask shard master? Becomes the bottleneck!
- Avoid shard master communication if possible
Can clients predict which group has which keys?
![Page 7: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/7.jpg)
Recurring Problem
Client needs to access some resource
Sharded for scalability
How does client find specific server to use?
Central redirection won’t scale!
![Page 8: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/8.jpg)
Another scenario
Client
![Page 9: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/9.jpg)
Another scenario
Client
GET index.html
![Page 10: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/10.jpg)
Another scenario
Client
index.html
![Page 11: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/11.jpg)
Another scenario
Client
index.htmlLinks to: logo.jpg, jquery.js, …
![Page 12: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/12.jpg)
Another scenario
Client
Cache 1 Cache 2 Cache 3
GET logo.jpg GET jquery.js
![Page 13: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/13.jpg)
Another scenario
Client 2
Cache 1 Cache 2 Cache 3
GET logo.jpg GET jquery.js
![Page 14: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/14.jpg)
Other Examples
Scalable stateless web front ends (FE)
- cache efficient iff same client goes to same FE
Scalable shopping cart service
Scalable email service
Scalable cache layer (Memcache)
Scalable network path allocation
Scalable network function virtualization (NFV)
…
![Page 15: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/15.jpg)
What’s in common?
Want to assign keys to servers with minimal communication, fast lookup
Requirement 1: clients all have same assignment
![Page 16: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/16.jpg)
Proposal 1
For n nodes, a key k goes to k mod n
Cache 1 Cache 2 Cache 3
“a”, “d”, “ab” “b” “c”
![Page 17: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/17.jpg)
Proposal 1
For n nodes, a key k goes to k mod n
Problems with this approach?
Cache 1 Cache 2 Cache 3
“a”, “d”, “ab” “b” “c”
![Page 18: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/18.jpg)
Proposal 1
For n nodes, a key k goes to k mod n
Problems with this approach?
- uneven distribution of keys
Cache 1 Cache 2 Cache 3
“a”, “d”, “ab” “b” “c”
![Page 19: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/19.jpg)
A Bit of Queueing Theory
Assume Poisson arrivals:
- random, uncorrelated, memoryless
- utilization (U): fraction of time server is busy (0 - 1)
- service time (S): average time per request
![Page 20: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/20.jpg)
Queueing Theory
0
20 S
40 S
60 S
80 S
100 S
0 0.2 0.4 0.6 0.8 1.0
R = S/(1-U)
Utilization U
Resp
onse
Tim
e R
Variance in response time ~ S/(1-U)^2
![Page 21: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/21.jpg)
Requirements, revisited
Requirement 1: clients all have same assignment
Requirement 2: keys uniformly distributed
![Page 22: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/22.jpg)
Proposal 2: Hashing
For n nodes, a key k goes to hash(k) mod n
Hash distributes keys uniformly
Cache 1 Cache 2 Cache 3
h(“a”)=1 h(“abc”)=2 h(“b”)=3
![Page 23: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/23.jpg)
Proposal 2: Hashing
For n nodes, a key k goes to hash(k) mod n
Hash distributes keys uniformly
But, new problem: what if we add a node?
Cache 1 Cache 2 Cache 3
h(“a”)=1 h(“abc”)=2 h(“b”)=3
![Page 24: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/24.jpg)
Proposal 2: Hashing
For n nodes, a key k goes to hash(k) mod n
Hash distributes keys uniformly
But, new problem: what if we add a node?
Cache 1 Cache 2 Cache 3
h(“a”)=1 h(“abc”)=2 h(“b”)=3
Cache 4
![Page 25: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/25.jpg)
Proposal 2: Hashing
For n nodes, a key k goes to hash(k) mod n
Hash distributes keys uniformly
But, new problem: what if we add a node?
Cache 1 Cache 2 Cache 3
h(“a”)=1h(“abc”)=2 h(“b”)=3
Cache 4
h(“a”)=3 h(“b”)=4
![Page 26: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/26.jpg)
h(“b”)=4h(“a”)=3
Proposal 2: Hashing
For n nodes, a key k goes to hash(k) mod n
Hash distributes keys uniformly
But, new problem: what if we add a node?
- Redistribute a lot of keys! (on average, all but K/n)
Cache 1 Cache 2 Cache 3
h(“abc”)=2
Cache 4
![Page 27: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/27.jpg)
Requirements, revisited
Requirement 1: clients all have same assignment
Requirement 2: keys uniformly distributed
Requirement 3: add/remove node moves only a few keys
![Page 28: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/28.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
![Page 29: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/29.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232
![Page 30: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/30.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)
![Page 31: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/31.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2)
![Page 32: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/32.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
![Page 33: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/33.jpg)
First, hash the node ids
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
![Page 34: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/34.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
![Page 35: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/35.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“a”
![Page 36: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/36.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“a”
hash(“a”)
![Page 37: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/37.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“a”
![Page 38: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/38.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“b”
![Page 39: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/39.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“b”
hash(“b”)
![Page 40: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/40.jpg)
First, hash the node ids
Keys are hashed, go to the “next” node
Proposal 3: Consistent Hashing
Cache 1 Cache 2 Cache 3
0 232hash(1)hash(2) hash(3)
“b”
![Page 41: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/41.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
![Page 42: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/42.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
![Page 43: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/43.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
What if we add a node?
![Page 44: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/44.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
Cache 4
![Page 45: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/45.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
Cache 4Only “b” has to move!
On average, K/n keys move
![Page 46: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/46.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
Cache 4
![Page 47: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/47.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
Cache 4
![Page 48: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/48.jpg)
Load Balance
Assume # keys >> # of servers
- For example, 100K users -> 100 servers
How far off of equal balance is hashing?
- What is typical worst case server?
How far off of equal balance is consistent hashing?
- What is typical worst case server?
![Page 49: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/49.jpg)
Proposal 3: Consistent Hashing
Cache 1Cache 2
Cache 3
“a”
“b”
Cache 4Only “b” has to move!
On average, K/n keys move but all between two nodes
![Page 50: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/50.jpg)
Requirements, revisited
Requirement 1: clients all have same assignment
Requirement 2: keys uniformly distributed
Requirement 3: add/remove node moves only a few keys
Requirement 4: minimize worst case overload
Requirement 5: parcel out work of redistributing keys
![Page 51: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/51.jpg)
First, hash the node ids to multiple locations
Proposal 4: Virtual Nodes
Cache 1 Cache 2 Cache 3
0 232
![Page 52: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/52.jpg)
First, hash the node ids to multiple locations
Proposal 4: Virtual Nodes
Cache 1 Cache 2 Cache 3
0 2321 1 1 1 1
![Page 53: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/53.jpg)
First, hash the node ids to multiple locations
Proposal 4: Virtual Nodes
Cache 1 Cache 2 Cache 3
0 2321 1 1 1 12 2 2 2 2
![Page 54: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/54.jpg)
First, hash the node ids to multiple locations
As it turns out, hash functions come in families s.t. their members are independent. So this is easy!
Proposal 4: Virtual Nodes
Cache 1 Cache 2 Cache 3
0 2321 1 1 1 12 2 2 2 2
![Page 55: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/55.jpg)
Prop 4: Virtual NodesCache 1
Cache 2
Cache 3
![Page 56: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/56.jpg)
Prop 4: Virtual NodesCache 1
Cache 2
Cache 3
![Page 57: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/57.jpg)
Prop 4: Virtual NodesCache 1
Cache 2
Cache 3
![Page 58: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/58.jpg)
Prop 4: Virtual NodesCache 1
Cache 2
Cache 3 Keys more evenly distributed and
migration is evenly spread out.
![Page 59: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/59.jpg)
How Many Virtual Nodes?
How many virtual nodes do we need per server?
- to spread worst case load
- to distribute migrating keys
Assume 100000 clients, 100 servers
- 10?
- 100?
- 1000?
-10000?
![Page 60: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/60.jpg)
Requirements, revisited
Requirement 1: clients all have same assignment
Requirement 2: keys uniformly distributed
Requirement 3: add/remove node moves only a few keys
Requirement 4: minimize worst case overload
Requirement 5: parcel out work of redistributing keys
![Page 61: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/61.jpg)
Key Popularity
• What if some keys are more popular than others • Hashing is no longer load balanced! • One model for popularity is the Zipf distribution • Popularity of kth most popular item, 1 < c < 2 • 1/k^c
• Ex: 1, 1/2, 1/3, … 1/100 … 1/1000 … 1/10000
![Page 62: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/62.jpg)
Zipf “Heavy Tail” Distribution
![Page 63: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/63.jpg)
Zipf Examples
• Web pages • Movies • Library books • Words in text • Salaries • City population • Twitter followers • … Whenever popularity is self-reinforcing Popularity changes dynamically: what is popular right now?
![Page 64: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/64.jpg)
Proposal 5: Table Indirection
Consistent hashing is (mostly) stateless
- Map is hash function of # servers, # virtual nodes
- Unbalanced with zipf workloads, dynamic load
Instead, put a small table on each client: O(# vnodes)
- table[hash(key)] -> server
- Same table on every client
- Shard master adjusts table entries to balance load
- Periodically broadcast new table
![Page 65: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/65.jpg)
Table Indirection
Cache 1
Cache 2
Cache 3
1
2
3
3
32
2
0
2322
Split hash range into buckets, assign each bucket to a server, busy server gets fewer buckets, can change over time
![Page 66: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/66.jpg)
Table Indirection
Cache 1
Cache 2
Cache 3
1
2
3
3
32
2
0
2322
Split hash range into buckets, assign each bucket to a server, low load servers get more buckets, can change over time
hash(“despacito”)
![Page 67: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/67.jpg)
Table Indirection
Cache 1
Cache 2
Cache 3
1
2
3
3
32
2
0
2322
Split hash range into buckets, assign each bucket to a server, low load servers get more buckets, can change over time
hash(“paxos”)
![Page 68: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/68.jpg)
Proposal 6: Power of Two Choices
Read-only or stateless workloads:
- allow any task to be handled on one of two servers
- pair picked at random: hash(k), hash’(k)
- (using consistent hashing with virtual nodes)
- periodically collect data about server load
- send new work to less loaded server of the two
- or with likelihood ~ (1 - load)
![Page 69: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/69.jpg)
Power of Two Choices
Why does this work?
- every key assigned to a different random pair
- suppose k1 happens to map to same server as a popular key k2
- k1’s alternate very likely to be different than k2’s alternate
Generalize: spread very busy keys over more choices
![Page 70: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/70.jpg)
Power of Two Choices
Cache 1
Cache 2
Cache 3
1
hash(“despacito”)2
![Page 71: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/71.jpg)
Power of Two Choices
Cache 1
Cache 2
Cache 3
1
hash(“despacito”)2
![Page 72: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/72.jpg)
Power of Two Choices
Cache 1
Cache 2
Cache 3
2
hash(“paxos”)3
![Page 73: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/73.jpg)
Power of Two Choices
Cache 1
Cache 2
Cache 3
2
hash(“paxos”)3
![Page 74: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/74.jpg)
Requirements, revisited
Requirement 1: clients all have same assignment
Requirement 2: keys uniformly distributed
Requirement 3: add/remove node moves only a few keys
Requirement 4: minimize worst case overload
Requirement 5: parcel out work of redistributing keys
Requirement 6: balance work even with zipf demand
![Page 75: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/75.jpg)
Next
“Distributed systems in practice”
- Memcache: scalable caching layer between stateless front ends and storage
- GFS: scalable distributed storage for stream files
- BigTable: scalable key-value store
- Spanner: cross-data center transactional key-value store
![Page 76: sharding - courses.cs.washington.edu · What if we want to scale to more clients? Sharding among multiple Paxos groups - partition key-space among groups - for single key operations,](https://reader036.fdocuments.in/reader036/viewer/2022071001/5fbe5536a5ec447ef41129d8/html5/thumbnails/76.jpg)
Thursday
Yegge on Service-Oriented Architectures
- Steve Yegge, prolific programmer and blogger
- Moved from Amazon to Google
- Reading is an accidentally-leaked memo about differences between Amazon’s and Google’s system architectures (at that time)
- SOA: separate applications (e.g. Google Search) into many primitive services, run internally as products