Apache ZooKeeper
-
Upload
scott-leberknight -
Category
Technology
-
view
14.729 -
download
0
description
Transcript of Apache ZooKeeper
![Page 1: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/1.jpg)
ZooKeeperScott Leberknight
![Page 2: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/2.jpg)
Your mission...
Build a distributed lock service
Only one process may own the lock
Must preserve ordering of requests
Ensure proper lock release
(should you choose to accept it)
...this message will self destruct in 5 seconds
![Page 3: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/3.jpg)
Mission Training
![Page 4: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/4.jpg)
“Distributed Coordination
Service”
Distributed, hierarchical filesystem
High availability, fault tolerant
Performant (i.e. it’s fast)
Facilitates loose coupling
![Page 5: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/5.jpg)
Fallacies of Distributed
Computing...
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
# 1 The network is reliable.
![Page 6: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/6.jpg)
Partial Failure
Did my message get through?
Did the operation complete?
Or, did it fail???
![Page 7: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/7.jpg)
One elected leaderMany followers
Followers may lag leaderEventual consistency
Follow the Leader
![Page 8: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/8.jpg)
What problems can it solve?
Group membership
Distributed data structures (locks, queues, barriers, etc.)
Reliable configuration service
Distributed workflow
![Page 9: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/9.jpg)
Training exercise:
Group Membership
![Page 10: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/10.jpg)
Get connected...
![Page 11: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/11.jpg)
public ZooKeeper connect(String hosts, int timeout) throws IOException, InterruptedException {
final CountDownLatch signal = new CountDownLatch(1); ZooKeeper zk = new ZooKeeper(hosts, timeout, new Watcher() { @Override public void process(WatchedEvent event) { if (event.getState() == Watcher.Event.KeeperState.SyncConnected) { signal.countDown(); } } }); signal.await(); return zk;} must wait for connected event!
![Page 12: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/12.jpg)
Tick time
Session timeout
Automatic failover
ZooKeeper Sessions
![Page 13: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/13.jpg)
znode
Persistent or ephemeral
Can hold data (like a file)
Persistent ones can have children (like a directory)
Optional sequential numbering
![Page 14: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/14.jpg)
public void create(String groupName) throws KeeperException, InterruptedException {
String path = "/" + groupName; String createdPath = zk.create(path, null /*data*/, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); System.out.println("Created " + createdPath);}
![Page 15: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/15.jpg)
public void join(String groupName, String memberName) throws KeeperException, InterruptedException {
String path = "/" + groupName + "/" + memberName; String createdPath = zk.create(path, null /*data*/, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); System.out.println("Created " + createdPath);}
![Page 16: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/16.jpg)
public void list(String groupName) throws KeeperException, InterruptedException {
String path = "/" + groupName; try { List<String> children = zk.getChildren(path, false); for (String child : children) { System.out.println(child); } } catch (KeeperException.NoNodeException e) { System.out.printf("Group %s does not exist\n", groupName); }}
![Page 17: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/17.jpg)
public void delete(String groupName) throws KeeperException, InterruptedException {
String path = "/" + groupName; try { List<String> children = zk.getChildren(path, false); for (String child : children) { zk.delete(path + "/" + child, -1); } zk.delete(path, -1); // parent } catch (KeeperException.NoNodeException e) { System.out.printf("%s does not exist\n", groupName); }}
-1 deletes unconditionally , or specify version
![Page 18: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/18.jpg)
initial group members
node-4 died
![Page 19: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/19.jpg)
Architecture
![Page 20: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/20.jpg)
Data Modelcustomer account
transaction
region
address
![Page 21: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/21.jpg)
Hierarchical filesystem
Comprised of znodes
Atomic znode access (reads/writes)
Watchers
Security(via authentication, ACLs)
znode data(< 1 MB)
![Page 22: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/22.jpg)
znode - types
Persistent
Ephemeral
Persistent Sequential
Ephemeral Sequential
die when session expires
![Page 23: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/23.jpg)
znode - sequential
sequence numbers
{
![Page 24: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/24.jpg)
znode - operations
Operation Type
create writedelete writeexists read
getChildren readgetData readsetData writegetACL readsetACL writesync read
![Page 25: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/25.jpg)
APIssynchronous
asynchronous
![Page 26: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/26.jpg)
Synchronous
public void list(final String groupName) throws KeeperException, InterruptedException {
String path = "/" + groupName; try { List<String> children = zk.getChildren(path, false); // process children... } catch (KeeperException.NoNodeException e) { // handle non-existent path... }}
![Page 27: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/27.jpg)
aSync
public void list(String groupName) throws KeeperException, InterruptedException {
String path = "/" + groupName; zk.getChildren(path, false, new AsyncCallback.ChildrenCallback() { @Override public void processResult(int rc, String path, Object ctx, List<String> children) { // process results when get called back later... } }, null /* optional context object */);}
![Page 28: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/28.jpg)
znode - Watchers
Set (one-time) watches on read operations
Write operations trigger watcheson affected znodes
Re-register watches on events (optional)
![Page 29: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/29.jpg)
public interface Watcher {
void process(WatchedEvent event);
// other details elided...}
![Page 30: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/30.jpg)
public class ChildZNodeWatcher implements Watcher { private Semaphore semaphore = new Semaphore(1);
public void watchChildren() throws InterruptedException, KeeperException { semaphore.acquire(); while (true) { List<String> children = zk.getChildren(lockPath, this); display(children); semaphore.acquire(); } } @Override public void process(WatchedEvent event) { if (event.getType() == Event.EventType.NodeChildrenChanged) { semaphore.release(); } } // other details elided...}
![Page 31: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/31.jpg)
Gotcha!
When using watchers...
...updates can be missed!(not seen by client)
![Page 32: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/32.jpg)
Data Consistency
![Page 33: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/33.jpg)
“A foolish consistency is the hobgoblin of little
minds, adored by little statesmen and philosophers
and divines. With consistency a great soul has simply
nothing to do.”
- Ralph Waldo Emerson(Self-Reliance)
![Page 34: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/34.jpg)
Data Consistency in ZK
Sequential updates
Atomicity (all or nothin’)
Consistent client view(across all ZooKeeper servers)
Durability (of writes)
Bounded lag time(eventual consistency)
![Page 35: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/35.jpg)
Ensemble
![Page 36: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/36.jpg)
LeaderFollower
clientclientclient client client
Follower
client
read readread read readwrite
broadcast broadcast
write
![Page 37: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/37.jpg)
Clients have session on one server
Writes routed through leader
Reads from server memory
Leader election
Atomic broadcast
“majority rules”
![Page 38: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/38.jpg)
Sessions
Client-requested timeout period
Automatic keep-alive (heartbeats)
Automatic/transparent failover
![Page 39: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/39.jpg)
Writes
All go through leader
Global ordering
zxid (ZooKeeper transaction id)
Broadcast to followers
every update
has unique
![Page 40: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/40.jpg)
Reads
“Follow-the-leader”
Can lag leader
Eventual consistency
In-memory (fast!)
![Page 41: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/41.jpg)
Training Complete
PASSED
![Page 42: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/42.jpg)
Final Mission:
Distributed Lock
![Page 43: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/43.jpg)
Objectives
Mutual exclusion between processes
Decouple lock users
![Page 44: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/44.jpg)
Building Blocks
ephemeral, sequential child znodes
parent lock znode
lock-node
child-2 child-Nchild-1 ...
![Page 45: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/45.jpg)
sample-lock
![Page 46: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/46.jpg)
sample-lock
process-1
Lock
![Page 47: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/47.jpg)
sample-lock
process-1 process-2
Lock
![Page 48: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/48.jpg)
sample-lock
process-1 process-3process-2
Lock
![Page 49: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/49.jpg)
sample-lock
process-3process-2
Lock
![Page 50: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/50.jpg)
sample-lock
process-3
Lock
![Page 51: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/51.jpg)
sample-lock
![Page 52: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/52.jpg)
pseudocode (remember that stuff?)
1. create child ephemeral, sequential znode
2. list children of lock node and set a watch
3. if znode from step 1 has lowest number, then lock is acquired. done.
4. else wait for watch event from step 2, and then go back to step 2 on receipt
![Page 53: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/53.jpg)
This is OK, but there are problems...
(perhaps a main B bus undervolt?)
![Page 54: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/54.jpg)
Problem #1 - Connection Loss
If partial failure on znode creation...
...then how do we know if znode was created?
![Page 55: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/55.jpg)
Solution #1 (Connection Loss)
Embed session id in child znode names
lock-<sessionId>-
Failed-over client checks for child w/ sessionId
![Page 56: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/56.jpg)
Problem #2 - The Herd Effect
![Page 57: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/57.jpg)
Many clients watching lock node.
Notification sent to all watchers on lock release...
...possible large spike in network traffic
![Page 58: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/58.jpg)
Solution #2 (The Herd Effect)
Set watch only on child znodewith preceding sequence number
e.g. client holding lock-9 watches only lock-8
Note, lock-9 could watch lock-7(e.g. lock-8 client died)
![Page 59: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/59.jpg)
Do it yourself?
Implementation
...or be lazy and use existing code?
org.apache.zookeeper.recipes.lock.WriteLockorg.apache.zookeeper.recipes.lock.LockListener
![Page 60: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/60.jpg)
Implementation, part deux
WriteLock calls back when lock acquired (async)
Maybe you want a synchronous client model...
Let’s do some decoration...
![Page 61: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/61.jpg)
public class BlockingWriteLock { private CountDownLatch signal = new CountDownLatch(1); // other instance variable definitions elided...
public BlockingWriteLock(String name, ZooKeeper zookeeper, String path, List<ACL> acls) { this.name = name; this.path = path; this.writeLock = new WriteLock(zookeeper, path, acls, new SyncLockListener()); }
public void lock() throws InterruptedException, KeeperException { writeLock.lock(); signal.await(); }
public void unlock() { writeLock.unlock(); }
class SyncLockListener implements LockListener { /* next slide */ }}
55
![Page 62: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/62.jpg)
class SyncLockListener implements LockListener {
@Override public void lockAcquired() { signal.countDown(); }
@Override public void lockReleased() { }
}
56
![Page 63: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/63.jpg)
BlockingWriteLock lock = new BlockingWriteLock(myName, zooKeeper, path, ZooDefs.Ids.OPEN_ACL_UNSAFE);try { lock.lock(); // do something while we have the lock}catch (Exception ex) { // handle appropriately...}finally { lock.unlock();}
57
Easy to forget!
![Page 64: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/64.jpg)
(we can do a little better, right?)
![Page 65: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/65.jpg)
public class DistributedOperationExecutor { private final ZooKeeper _zk; public DistributedOperationExecutor(ZooKeeper zk) { _zk = zk; } public Object withLock(String name, String lockPath, List<ACL> acls, DistributedOperation op) throws InterruptedException, KeeperException { BlockingWriteLock writeLock = new BlockingWriteLock(name, _zk, lockPath, acl); try { writeLock.lock(); return op.execute(); } finally { writeLock.unlock(); } }}
![Page 66: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/66.jpg)
executor.withLock(myName, path, ZooDefs.Ids.OPEN_ACL_UNSAFE, new DistributedOperation() { @Override public Object execute() { // do something while we have the lock return whateverTheResultIs; } });
![Page 67: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/67.jpg)
Run it!
![Page 68: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/68.jpg)
![Page 69: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/69.jpg)
Mission:
ACCOMPLISHED
![Page 70: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/70.jpg)
Review...
![Page 71: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/71.jpg)
let’s crowdsource(for free beer, of course)
![Page 72: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/72.jpg)
Like a filesystem, except distributed & replicated
Build distributed coordination, data structures, etc.
High-availability, reliability
Writes via leader, in-memory reads (fast)
Automatic session failover, keep-alive
![Page 73: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/73.jpg)
Refs
![Page 74: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/74.jpg)
shop.oreilly.com/product/0636920021773.do
(3rd edition pub date is May 29, 2012)
zookeeper.apache.org/
![Page 75: Apache ZooKeeper](https://reader033.fdocuments.in/reader033/viewer/2022051616/553af3c2550346b9328b4621/html5/thumbnails/75.jpg)
photo attributions
Follow the Leader - http://www.flickr.com/photos/davidspinks/4211977680/
Antique Clock - http://www.flickr.com/photos/cncphotos/3828163139/
Skull & Crossbones - http://www.flickr.com/photos/halfanacre/2841434212/
Ralph Waldo Emerson - http://www.americanpoems.com/poets/emerson
Herd of Sheep - http://www.flickr.com/photos/freefoto/639294974/
(others from iStockPhoto...)
1921 Jazzing Orchestra - http://en.wikipedia.org/wiki/File:Jazzing_orchestra_1921.png
Running on Beach - http://www.flickr.com/photos/kcdale99/3148108922/
Crowd - http://www.flickr.com/photos/laubarnes/5449810523/
Apollo 13 Patch - http://science.ksc.nasa.gov/history/apollo/apollo-13/apollo-13.html